An Improved Analysis of Training Over-Parameterized Deep Neural Networks

Abstract

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O(n^{24})$). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

Cite

Text

Zou and Gu. "An Improved Analysis of Training Over-Parameterized Deep Neural Networks." Neural Information Processing Systems, 2019.

Markdown

[Zou and Gu. "An Improved Analysis of Training Over-Parameterized Deep Neural Networks." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/zou2019neurips-improved/)

BibTeX

@inproceedings{zou2019neurips-improved,
  title     = {{An Improved Analysis of Training Over-Parameterized Deep Neural Networks}},
  author    = {Zou, Difan and Gu, Quanquan},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {2055-2064},
  url       = {https://mlanthology.org/neurips/2019/zou2019neurips-improved/}
}