On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems

Abstract

In this paper, we analyze the trajectories of stochastic gradient descent (SGD) with the aim of understanding their convergence properties in non-convex problems. We first show that the sequence of iterates generated by SGD remains bounded and converges with probability $1$ under a very broad range of step-size schedules. Subsequently, we prove that the algorithm's rate of convergence to local minimizers with a positive-definite Hessian is $O(1/n^p)$ if the method is run with a $Θ(1/n^p)$ step-size. This provides an important guideline for tuning the algorithm's step-size as it suggests that a cool-down phase with a vanishing step-size could lead to significant performance gains; we demonstrate this heuristic using ResNet architectures on CIFAR. Finally, going beyond existing positive probability guarantees, we show that SGD avoids strict saddle points/manifolds with probability $1$ for the entire spectrum of step-size policies considered.

PDF NeurIPS Semantic Scholar

Cite

Text

Mertikopoulos et al. "On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems." Neural Information Processing Systems, 2020.

Markdown

[Mertikopoulos et al. "On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems." Neural Information Processing Systems, 2020.](https://mlanthology.org/neurips/2020/mertikopoulos2020neurips-almost/)

BibTeX

@inproceedings{mertikopoulos2020neurips-almost,
  title     = {{On the Almost Sure Convergence of Stochastic Gradient Descent in Non-Convex Problems}},
  author    = {Mertikopoulos, Panayotis and Hallak, Nadav and Kavis, Ali and Cevher, Volkan},
  booktitle = {Neural Information Processing Systems},
  year      = {2020},
  url       = {https://mlanthology.org/neurips/2020/mertikopoulos2020neurips-almost/}
}