Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Abstract

We design an algorithm which finds an $\epsilon$-approximate stationary point (with $\|\nabla F(x)\|\le \epsilon$) using $O(\epsilon^{-3})$ stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic $p$th order methods for any $p\ge 2$, even when the first $p$ derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding $(\epsilon,\gamma)$-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

Cite

Text

Arjevani et al. "Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations." Conference on Learning Theory, 2020.

Markdown

[Arjevani et al. "Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations." Conference on Learning Theory, 2020.](https://mlanthology.org/colt/2020/arjevani2020colt-secondorder/)

BibTeX

@inproceedings{arjevani2020colt-secondorder,
  title     = {{Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations}},
  author    = {Arjevani, Yossi and Carmon, Yair and Duchi, John C. and Foster, Dylan J. and Sekhari, Ayush and Sridharan, Karthik},
  booktitle = {Conference on Learning Theory},
  year      = {2020},
  pages     = {242-299},
  volume    = {125},
  url       = {https://mlanthology.org/colt/2020/arjevani2020colt-secondorder/}
}