Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks

Abstract

We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks equipped with ReLU activation function. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network.

Cite

Text

Polaczyk and Cyranka. "Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks." Transactions on Machine Learning Research, 2023.

Markdown

[Polaczyk and Cyranka. "Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks." Transactions on Machine Learning Research, 2023.](https://mlanthology.org/tmlr/2023/polaczyk2023tmlr-improved/)

BibTeX

@article{polaczyk2023tmlr-improved,
  title     = {{Improved Overparametrization Bounds for Global Convergence of SGD for Shallow Neural Networks}},
  author    = {Polaczyk, Bartłomiej and Cyranka, Jacek},
  journal   = {Transactions on Machine Learning Research},
  year      = {2023},
  url       = {https://mlanthology.org/tmlr/2023/polaczyk2023tmlr-improved/}
}