Continuous-Time Models for Stochastic Optimization Algorithms

Abstract

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods. We exploit these continuous-time models, together with simple Lyapunov analysis as well as tools from stochastic calculus, in order to derive convergence bounds for various types of non-convex functions. Guided by such analysis, we show that the same Lyapunov arguments hold in discrete-time, leading to matching rates. In addition, we use these models and Ito calculus to infer novel insights on the dynamics of SGD, proving that a decreasing learning rate acts as time warping or, equivalently, as landscape stretching.

Cite

Text

Orvieto and Lucchi. "Continuous-Time Models for Stochastic Optimization Algorithms." Neural Information Processing Systems, 2019.

Markdown

[Orvieto and Lucchi. "Continuous-Time Models for Stochastic Optimization Algorithms." Neural Information Processing Systems, 2019.](https://mlanthology.org/neurips/2019/orvieto2019neurips-continuoustime/)

BibTeX

@inproceedings{orvieto2019neurips-continuoustime,
  title     = {{Continuous-Time Models for Stochastic Optimization Algorithms}},
  author    = {Orvieto, Antonio and Lucchi, Aurelien},
  booktitle = {Neural Information Processing Systems},
  year      = {2019},
  pages     = {12610-12622},
  url       = {https://mlanthology.org/neurips/2019/orvieto2019neurips-continuoustime/}
}