Grokking Phase Transitions in Learning Local Rules with Gradient Descent

Abstract

We discuss two solvable grokking (generalisation beyond overfitting) models in a rule-learning scenario. We show that grokking is a phase transition and find exact analytic expressions for the critical exponents, grokking probability, and grokking time distribution. Further, we introduce a tensor network map that connects the proposed grokking setup with the standard (perceptron) statistical learning theory and provide evidence that grokking is a consequence of the locality of the teacher model. We analyze the rule-30 cellular automaton learning task, numerically determine the critical exponent and the grokking time distribution, and compare them with the prediction of the proposed grokking model. Finally, we numerically study the connection between structure formation and grokking.

Cite

Text

Žunkovič and Ilievski. "Grokking Phase Transitions in Learning Local Rules with Gradient Descent." Journal of Machine Learning Research, 2024.

Markdown

[Žunkovič and Ilievski. "Grokking Phase Transitions in Learning Local Rules with Gradient Descent." Journal of Machine Learning Research, 2024.](https://mlanthology.org/jmlr/2024/zunkovic2024jmlr-grokking/)

BibTeX

@article{zunkovic2024jmlr-grokking,
  title     = {{Grokking Phase Transitions in Learning Local Rules with Gradient Descent}},
  author    = {Žunkovič, Bojan and Ilievski, Enej},
  journal   = {Journal of Machine Learning Research},
  year      = {2024},
  pages     = {1-52},
  volume    = {25},
  url       = {https://mlanthology.org/jmlr/2024/zunkovic2024jmlr-grokking/}
}