Stepping on the Edge: Curvature Aware Learning Rate Tuners

Roulet, Vincent; Agarwala, Atish; Grill, Jean-Bastien; Swirszcz, Grzegorz; Blondel, Mathieu; Pedregosa, Fabian

doi:10.52202/079017-1511

Stepping on the Edge: Curvature Aware Learning Rate Tuners

Vincent Roulet, Atish Agarwala, Jean-Bastien Grill, Grzegorz Swirszcz, Mathieu Blondel, Fabian Pedregosa

NeurIPS 2024

doi:10.52202/079017-1511 /neurips/2024/roulet2024neurips-stepping/

Abstract

Curvature information -- particularly, the largest eigenvalue of the lossHessian, known as the sharpness -- often forms the basis for learning ratetuners. However, recent work has shown that the curvature information undergoescomplex dynamics during training, going from a phase of increasing sharpness toeventual stabilization. We analyze the closed-loop feedback effect betweenlearning rate tuning and curvature. We find that classical learning rate tunersmay yield greater one-step loss reduction, yet they ultimately underperform inthe long term when compared to constant learning rates in the full batch regime. These models break the stabilization of the sharpness, which we explain using asimplified model of the joint dynamics of the learning rate and the curvature. To further investigate these effects, we introduce a new learning rate tuningmethod, Curvature Dynamics Aware Tuning (CDAT), which prioritizes long termcurvature stabilization over instantaneous progress on the objective. In thefull batch regime, CDAT shows behavior akin to prefixed warm-up schedules on deeplearning objectives, outperforming tuned constant learning rates. In the minibatch regime, we observe that stochasticity introduces confounding effects thatexplain the previous success of some learning rate tuners at appropriate batchsizes. Our findings highlight the critical role of understanding the jointdynamics of the learning rate and curvature, beyond greedy minimization, todiagnose failures and design effective adaptive learning rate tuners.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Roulet et al. "Stepping on the Edge: Curvature Aware Learning Rate Tuners." Neural Information Processing Systems, 2024. doi:10.52202/079017-1511

Markdown

[Roulet et al. "Stepping on the Edge: Curvature Aware Learning Rate Tuners." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/roulet2024neurips-stepping/) doi:10.52202/079017-1511

BibTeX

@inproceedings{roulet2024neurips-stepping,
  title     = {{Stepping on the Edge: Curvature Aware Learning Rate Tuners}},
  author    = {Roulet, Vincent and Agarwala, Atish and Grill, Jean-Bastien and Swirszcz, Grzegorz and Blondel, Mathieu and Pedregosa, Fabian},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1511},
  url       = {https://mlanthology.org/neurips/2024/roulet2024neurips-stepping/}
}