Fast Convergence in Learning Two-Layer Neural Networks with Separable Data
Abstract
Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.
Cite
Text
Taheri and Thrampoulidis. "Fast Convergence in Learning Two-Layer Neural Networks with Separable Data." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26186Markdown
[Taheri and Thrampoulidis. "Fast Convergence in Learning Two-Layer Neural Networks with Separable Data." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/taheri2023aaai-fast/) doi:10.1609/AAAI.V37I8.26186BibTeX
@inproceedings{taheri2023aaai-fast,
title = {{Fast Convergence in Learning Two-Layer Neural Networks with Separable Data}},
author = {Taheri, Hossein and Thrampoulidis, Christos},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {9944-9952},
doi = {10.1609/AAAI.V37I8.26186},
url = {https://mlanthology.org/aaai/2023/taheri2023aaai-fast/}
}