Grokking, Rank Minimization and Generalization in Deep Learning

Abstract

Much work has been devoted to explaining the recently discovered \textit{``grokking''} phenomenon, where a neural network first fits the training loss, then many iterations later suddenly fits the validation loss. To explore this puzzling behavior, we examine the evolution of singular values and vectors of weight matrices inside the neural network. First we show that the transition to generalization in grokking coincides with the discovery of a low-rank solution in the weights. We then show that the trend towards rank minimization is much more general than grokking alone and elucidate the crucial role that weight decay plays in promoting this trend. Such analysis leads to a deeper understanding of generalization in practical systems.

Cite

Text

Yunis et al. "Grokking, Rank Minimization and Generalization in Deep Learning." ICML 2024 Workshops: MI, 2024.

Markdown

[Yunis et al. "Grokking, Rank Minimization and Generalization in Deep Learning." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/yunis2024icmlw-grokking/)

BibTeX

@inproceedings{yunis2024icmlw-grokking,
  title     = {{Grokking, Rank Minimization and Generalization in Deep Learning}},
  author    = {Yunis, David and Patel, Kumar Kshitij and Wheeler, Samuel and Savarese, Pedro Henrique Pamplona and Vardi, Gal and Livescu, Karen and Maire, Michael and Walter, Matthew},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/yunis2024icmlw-grokking/}
}