What Can Grokking Teach Us About Learning Under Non-Stationarity?

Abstract

In continual learning problems, it is often necessary to overwrite components of a neural network’s learned representation in response to changes in the data stream; however, neural networks often exhibit \textit{primacy bias}, whereby early training data hinders the network’s ability to generalize on later tasks. While feature-learning dynamics of nonstationary learning problems are not well studied, the emergence of feature-learning dynamics is known to drive the phenomenon of \textit{grokking}, wherein neural networks initially memorize their training data and only later exhibit perfect generalization. This work conjectures that the same feature-learning dynamics which facilitate generalization in grokking also underlie the ability to overwrite previous \textit{learned} features as well, and methods which accelerate grokking by facilitating feature-learning dynamics are promising candidates for addressing primacy bias in non-stationary learning problems. We then propose a straightforward method to induce feature-learning dynamics as needed throughout training by increasing the \textit{effective} learning rate, i.e. the ratio between parameter and update norms. We show that this approach both facilitates feature-learning and improves generalization in a variety of settings, including grokking, warm-starting neural network training, and reinforcement learning tasks.

Cite

Text

Lyle et al. "What Can Grokking Teach Us About Learning Under Non-Stationarity?." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.

Markdown

[Lyle et al. "What Can Grokking Teach Us About Learning Under Non-Stationarity?." Proceedings of The 4th Conference on Lifelong Learning Agents, 2025.](https://mlanthology.org/collas/2025/lyle2025collas-grokking/)

BibTeX

@inproceedings{lyle2025collas-grokking,
  title     = {{What Can Grokking Teach Us About Learning Under Non-Stationarity?}},
  author    = {Lyle, Clare and Sokar, Ghada and György, András and Pascanu, Razvan},
  booktitle = {Proceedings of The 4th Conference on Lifelong Learning Agents},
  year      = {2025},
  pages     = {635-656},
  volume    = {330},
  url       = {https://mlanthology.org/collas/2025/lyle2025collas-grokking/}
}