Asymptotic Dynamics for Delayed Feature Learning in a Toy Model

Abstract

We consider a toy model that exhibits grokking, recently advanced by [Kumar et al, 2023], and take advantage of the simple setting to derive the dynamics of the train and test loss using Dynamical Mean Field Theory (DMFT). This gives a closed-form expression for the gap between train and test loss that characterizes grokking in this toy model, illustrating how two parameters of interest -- NTK alignment and network laziness -- control the size of this gap and how grokking emerges as a uniquely offline property during repeated training over the same dataset. This is the first quantitative characterization of grokking dynamics in a general setting that makes no assumptions about weight decay, weight norm, etc.

Cite

Text

Bordelon et al. "Asymptotic Dynamics for Delayed Feature Learning in a Toy Model." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Bordelon et al. "Asymptotic Dynamics for Delayed Feature Learning in a Toy Model." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/bordelon2024icmlw-asymptotic/)

BibTeX

@inproceedings{bordelon2024icmlw-asymptotic,
  title     = {{Asymptotic Dynamics for Delayed Feature Learning in a Toy Model}},
  author    = {Bordelon, Blake and Kumar, Tanishq and Gershman, Samuel J. and Pehlevan, Cengiz},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/bordelon2024icmlw-asymptotic/}
}