Natasha 2: Faster Non-Convex Optimization than SGD

Abstract

We design a stochastic algorithm to find $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients. The best result before this work was $O(\varepsilon^{-4})$ by stochastic gradient descent (SGD).

Cite

Text

Allen-Zhu. "Natasha 2: Faster Non-Convex Optimization than SGD." Neural Information Processing Systems, 2018.

Markdown

[Allen-Zhu. "Natasha 2: Faster Non-Convex Optimization than SGD." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/allenzhu2018neurips-natasha/)

BibTeX

@inproceedings{allenzhu2018neurips-natasha,
  title     = {{Natasha 2: Faster Non-Convex Optimization than SGD}},
  author    = {Allen-Zhu, Zeyuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2018},
  pages     = {2675-2686},
  url       = {https://mlanthology.org/neurips/2018/allenzhu2018neurips-natasha/}
}