Natasha 2: Faster Non-Convex Optimization than SGD
Abstract
We design a stochastic algorithm to find $\varepsilon$-approximate local minima of any smooth nonconvex function in rate $O(\varepsilon^{-3.25})$, with only oracle access to stochastic gradients. The best result before this work was $O(\varepsilon^{-4})$ by stochastic gradient descent (SGD).
Cite
Text
Allen-Zhu. "Natasha 2: Faster Non-Convex Optimization than SGD." Neural Information Processing Systems, 2018.Markdown
[Allen-Zhu. "Natasha 2: Faster Non-Convex Optimization than SGD." Neural Information Processing Systems, 2018.](https://mlanthology.org/neurips/2018/allenzhu2018neurips-natasha/)BibTeX
@inproceedings{allenzhu2018neurips-natasha,
title = {{Natasha 2: Faster Non-Convex Optimization than SGD}},
author = {Allen-Zhu, Zeyuan},
booktitle = {Neural Information Processing Systems},
year = {2018},
pages = {2675-2686},
url = {https://mlanthology.org/neurips/2018/allenzhu2018neurips-natasha/}
}