Escaping Saddles with Stochastic Gradients

Abstract

We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients indeed exhibit a strong component along these directions. Furthermore, we show that - contrary to the case of isotropic noise - this variance is proportional to the magnitude of the corresponding eigenvalues and not decreasing in the dimensionality. Based upon this bservation we propose a new assumption under which we show that the injection of explicit, isotropic noise usually applied to make gradient descent escape saddle points can successfully be replaced by a simple SGD step. Additionally - and under the same condition - we derive the first convergence rate for plain SGD to a second-order stationary point in a number of iterations that is independent of the problem dimension.

Cite

Text

Daneshmand et al. "Escaping Saddles with Stochastic Gradients." International Conference on Machine Learning, 2018.

Markdown

[Daneshmand et al. "Escaping Saddles with Stochastic Gradients." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/daneshmand2018icml-escaping/)

BibTeX

@inproceedings{daneshmand2018icml-escaping,
  title     = {{Escaping Saddles with Stochastic Gradients}},
  author    = {Daneshmand, Hadi and Kohler, Jonas and Lucchi, Aurelien and Hofmann, Thomas},
  booktitle = {International Conference on Machine Learning},
  year      = {2018},
  pages     = {1155-1164},
  volume    = {80},
  url       = {https://mlanthology.org/icml/2018/daneshmand2018icml-escaping/}
}