SGD Converges to Global Minimum in Deep Learning via Star-Convex Path

Abstract

Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a variety of deep neural networks. However, there is still a lack of understanding on how and why SGD can train these complex networks towards a global minimum. In this study, we establish the convergence of SGD to a global minimum for nonconvex optimization problems that are commonly encountered in neural network training. Our argument exploits the following two important properties: 1) the training loss can achieve zero value (approximately), which has been widely observed in deep learning; 2) SGD follows a star-convex path, which is verified by various experiments in this paper. In such a context, our analysis shows that SGD, although has long been considered as a randomized algorithm, converges in an intrinsically deterministic manner to a global minimum.

Cite

Text

Zhou et al. "SGD Converges to Global Minimum in Deep Learning via Star-Convex Path." International Conference on Learning Representations, 2019.

Markdown

[Zhou et al. "SGD Converges to Global Minimum in Deep Learning via Star-Convex Path." International Conference on Learning Representations, 2019.](https://mlanthology.org/iclr/2019/zhou2019iclr-sgd/)

BibTeX

@inproceedings{zhou2019iclr-sgd,
  title     = {{SGD Converges to Global Minimum in Deep Learning via Star-Convex Path}},
  author    = {Zhou, Yi and Yang, Junjie and Zhang, Huishuai and Liang, Yingbin and Tarokh, Vahid},
  booktitle = {International Conference on Learning Representations},
  year      = {2019},
  url       = {https://mlanthology.org/iclr/2019/zhou2019iclr-sgd/}
}