The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods

Abstract

Adaptive gradient methods, notably Adam ~\citep{kingma2014adam, loshchilov2017decoupled}, have become indispensable for optimizing neural networks, particularly in conjunction with Transformers ~\citep{vaswani2017attention, dosovitskiy2020an}. In this paper, we present a novel optimization anomaly called the \emph{Slingshot Effect}, which manifests during extremely late stages of training. We identify a distinctive characteristic of this phenomenon through cyclic phase transitions between stable and unstable training regimes, as evidenced by the cyclic behavior of the norm of the last layer's weights. Although the Slingshot Effect can be easily reproduced in more general settings, it does not align with any known optimization theories, emphasizing the need for in-depth examination. Moreover, we make a noteworthy observation that Grokking, as reported by ~\citet{power2021grokking}, occurs predominantly during the onset of the Slingshot Effects and is absent without it, even in the absence of explicit regularization. This finding suggests a surprising inductive bias of adaptive gradient optimizers at late training stages, urging a revised theoretical analysis of their origin. Our study sheds light on an intriguing optimization behavior that has significant implications for understanding the inner workings of adaptive gradient methods.

Cite

Text

Thilak et al. "The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods." Transactions on Machine Learning Research, 2024.

Markdown

[Thilak et al. "The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/thilak2024tmlr-slingshot/)

BibTeX

@article{thilak2024tmlr-slingshot,
  title     = {{The Slingshot Effect: A Late-Stage Optimization Anomaly in Adaptive Gradient Methods}},
  author    = {Thilak, Vimal and Littwin, Etai and Zhai, Shuangfei and Saremi, Omid and Paiss, Roni and Susskind, Joshua M.},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/thilak2024tmlr-slingshot/}
}