Deep Networks Always Grok and Here Is Why

Ahmed Imtiaz Humayun, Randall Balestriero, Richard Baraniuk

ICMLW 2024

/icmlw/2024/humayun2024icmlw-deep/

Abstract

Grokking, or {\em delayed generalization}, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings, such as training of a convolutional neural network (CNN) on CIFAR10 or a Resnet on Imagenette. We introduce the new concept of {\em delayed robustness}, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization. We develop an analytical explanation for the emergence of both delayed generalization and delayed robustness based on the {\em local complexity} of a DNN's input-output mapping. Our \textit{local complexity} measures the density of so-called ``linear regions’’ (aka, spline partition regions) that tile the DNN input space.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Humayun et al. "Deep Networks Always Grok and Here Is Why." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Humayun et al. "Deep Networks Always Grok and Here Is Why." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/humayun2024icmlw-deep/)

BibTeX

@inproceedings{humayun2024icmlw-deep,
  title     = {{Deep Networks Always Grok and Here Is Why}},
  author    = {Humayun, Ahmed Imtiaz and Balestriero, Randall and Baraniuk, Richard},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/humayun2024icmlw-deep/}
}