Deep Networks Always Grok and Here Is Why
Abstract
Grokking, or {\em delayed generalization}, is a phenomenon where generalization in a deep neural network (DNN) occurs long after achieving near zero training error. Previous studies have reported the occurrence of grokking in specific controlled settings, such as DNNs initialized with large-norm parameters or transformers trained on algorithmic datasets. We demonstrate that grokking is actually much more widespread and materializes in a wide range of practical settings, such as training of a convolutional neural network (CNN) on CIFAR10 or a Resnet on Imagenette. We introduce the new concept of {\em delayed robustness}, whereby a DNN groks adversarial examples and becomes robust, long after interpolation and/or generalization. We develop an analytical explanation for the emergence of both delayed generalization and delayed robustness based on the {\em local complexity} of a DNN's input-output mapping. Our \textit{local complexity} measures the density of so-called ``linear regions’’ (aka, spline partition regions) that tile the DNN input space.
Cite
Text
Humayun et al. "Deep Networks Always Grok and Here Is Why." ICML 2024 Workshops: HiLD, 2024.Markdown
[Humayun et al. "Deep Networks Always Grok and Here Is Why." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/humayun2024icmlw-deep/)BibTeX
@inproceedings{humayun2024icmlw-deep,
title = {{Deep Networks Always Grok and Here Is Why}},
author = {Humayun, Ahmed Imtiaz and Balestriero, Randall and Baraniuk, Richard},
booktitle = {ICML 2024 Workshops: HiLD},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/humayun2024icmlw-deep/}
}