Fast Convergence in Learning Two-Layer Neural Networks with Separable Data

Abstract

Normalized gradient descent has shown substantial success in speeding up the convergence of exponentially-tailed loss functions (which includes exponential and logistic losses) on linear classifiers with separable data. In this paper, we go beyond linear models by studying normalized GD on two-layer neural nets. We prove for exponentially-tailed losses that using normalized GD leads to linear rate of convergence of the training loss to the global optimum. This is made possible by showing certain gradient self-boundedness conditions and a log-Lipschitzness property. We also study generalization of normalized GD for convex objectives via an algorithmic-stability analysis. In particular, we show that normalized GD does not overfit during training by establishing finite-time generalization bounds.

Cite

Text

Taheri and Thrampoulidis. "Fast Convergence in Learning Two-Layer Neural Networks with Separable Data." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I8.26186

Markdown

[Taheri and Thrampoulidis. "Fast Convergence in Learning Two-Layer Neural Networks with Separable Data." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/taheri2023aaai-fast/) doi:10.1609/AAAI.V37I8.26186

BibTeX

@inproceedings{taheri2023aaai-fast,
  title     = {{Fast Convergence in Learning Two-Layer Neural Networks with Separable Data}},
  author    = {Taheri, Hossein and Thrampoulidis, Christos},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {9944-9952},
  doi       = {10.1609/AAAI.V37I8.26186},
  url       = {https://mlanthology.org/aaai/2023/taheri2023aaai-fast/}
}