Why Momentum Really Works

Abstract

Distill articles are interactive publications and do not include traditional abstracts. This summary was written for the ML Anthology. Provides a rigorous mathematical analysis of momentum in gradient-based optimization using convex quadratic functions as a model. Demonstrates why momentum achieves a quadratic speedup over standard gradient descent and how it enables larger step sizes.

Cite

Text

Goh. "Why Momentum Really Works." Distill, 2017. doi:10.23915/distill.00006

Markdown

[Goh. "Why Momentum Really Works." Distill, 2017.](https://mlanthology.org/distill/2017/goh2017distill-momentum/) doi:10.23915/distill.00006

BibTeX

@article{goh2017distill-momentum,
  title     = {{Why Momentum Really Works}},
  author    = {Goh, Gabriel},
  journal   = {Distill},
  year      = {2017},
  doi       = {10.23915/distill.00006},
  url       = {https://mlanthology.org/distill/2017/goh2017distill-momentum/}
}