“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions

Abstract

We develop and analyze a variant of Nesterov’s accelerated gradient descent (AGD) for minimization of smooth non-convex functions. We prove that one of two cases occurs: either our AGD variant converges quickly, as if the function was convex, or we produce a certificate that the function is “guilty” of being non-convex. This non-convexity certificate allows us to exploit negative curvature and obtain deterministic, dimension-free acceleration of convergence for non-convex functions. For a function $f$ with Lipschitz continuous gradient and Hessian, we compute a point $x$ with $\|\nabla f(x)\| \le \epsilon$ in $O(\epsilon^{-7/4} \log(1/ \epsilon) )$ gradient and function evaluations. Assuming additionally that the third derivative is Lipschitz, we require only $O(\epsilon^{-5/3} \log(1/ \epsilon) )$ evaluations.

Cite

Text

Carmon et al. "“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions." International Conference on Machine Learning, 2017.

Markdown

[Carmon et al. "“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions." International Conference on Machine Learning, 2017.](https://mlanthology.org/icml/2017/carmon2017icml-convex/)

BibTeX

@inproceedings{carmon2017icml-convex,
  title     = {{“Convex Until Proven Guilty”: Dimension-Free Acceleration of Gradient Descent on Non-Convex Functions}},
  author    = {Carmon, Yair and Duchi, John C. and Hinder, Oliver and Sidford, Aaron},
  booktitle = {International Conference on Machine Learning},
  year      = {2017},
  pages     = {654-663},
  volume    = {70},
  url       = {https://mlanthology.org/icml/2017/carmon2017icml-convex/}
}