Why Momentum Really Works
Abstract
Distill articles are interactive publications and do not include traditional abstracts. This summary was written for the ML Anthology. Provides a rigorous mathematical analysis of momentum in gradient-based optimization using convex quadratic functions as a model. Demonstrates why momentum achieves a quadratic speedup over standard gradient descent and how it enables larger step sizes.
Cite
Text
Goh. "Why Momentum Really Works." Distill, 2017. doi:10.23915/distill.00006Markdown
[Goh. "Why Momentum Really Works." Distill, 2017.](https://mlanthology.org/distill/2017/goh2017distill-momentum/) doi:10.23915/distill.00006BibTeX
@article{goh2017distill-momentum,
title = {{Why Momentum Really Works}},
author = {Goh, Gabriel},
journal = {Distill},
year = {2017},
doi = {10.23915/distill.00006},
url = {https://mlanthology.org/distill/2017/goh2017distill-momentum/}
}