Dimension-Adapted Momentum Outscales SGD

Ferbach, Damien; Everett, Katie; Gidel, Gauthier; Paquette, Elliot; Paquette, Courtney

Dimension-Adapted Momentum Outscales SGD

Damien Ferbach, Katie Everett, Gauthier Gidel, Elliot Paquette, Courtney Paquette

NeurIPS 2025

/neurips/2025/ferbach2025neurips-dimensionadapted/

Abstract

We investigate scaling laws for stochastic momentum algorithms on the power law random features model, parameterized by data complexity, target complexity, and model size. When trained with a stochastic momentum algorithm, our analysis reveals four distinct loss curve shapes determined by varying data-target complexities. While traditional stochastic gradient descent with momentum (SGD-M) yields identical scaling law exponents to SGD, dimension-adapted Nesterov acceleration (DANA) improves these exponents by scaling momentum hyperparameters based on model size and data complexity. This outscaling phenomenon, which also improves compute-optimal scaling behavior, is achieved by DANA across a broad range of data and target complexities, while traditional methods fall short. Extensive experiments on high-dimensional synthetic quadratics validate our theoretical predictions and large-scale text experiments with LSTMs show DANA's improved loss exponents over SGD hold in a practical setting.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Ferbach et al. "Dimension-Adapted Momentum Outscales SGD." Advances in Neural Information Processing Systems, 2025.

Markdown

[Ferbach et al. "Dimension-Adapted Momentum Outscales SGD." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/ferbach2025neurips-dimensionadapted/)

BibTeX

@inproceedings{ferbach2025neurips-dimensionadapted,
  title     = {{Dimension-Adapted Momentum Outscales SGD}},
  author    = {Ferbach, Damien and Everett, Katie and Gidel, Gauthier and Paquette, Elliot and Paquette, Courtney},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/ferbach2025neurips-dimensionadapted/}
}