What Do Larger Image Classifiers Memorise?

Abstract

The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (“memorise”) completely random labels. To carefully study this issue, Feldman (2019) proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the corresponding memorisation profile of a ResNet on image classification benchmarks. While an exciting first glimpse into what real-world models memorise, this leaves open a fundamental question: do larger neural models memorise more? This aligns with the common practice of training models of different sizes, each offering different cost-quality trade-offs: while larger models are typically observed to have higher quality, it is of interest to understand whether this is merely a consequence of them memorising larger numbers of input-output patterns. We present a comprehensive empirical analysis of this question on image classification benchmarks. We find that training examples exhibit an unexpectedly diverse set of memorisation trajectories across model sizes: most samples experienced decreased memorisation under larger models, while the rest exhibit cap-shaped or increasing memorisation. We show that various proxies for the Feldman(2019) memorisation score fail to capture these fundamental trends. Lastly, we find that knowledge distillation — an effective and popular model compression technique — tends to inhibit memorisation, while also improving generalisation. Specifically, memorisation is mostly inhibited on examples with increasing memorisation trajectories, thus pointing at how distillation improves generalisation.

Cite

Text

Lukasik et al. "What Do Larger Image Classifiers Memorise?." Transactions on Machine Learning Research, 2024.

Markdown

[Lukasik et al. "What Do Larger Image Classifiers Memorise?." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/lukasik2024tmlr-larger/)

BibTeX

@article{lukasik2024tmlr-larger,
  title     = {{What Do Larger Image Classifiers Memorise?}},
  author    = {Lukasik, Michal and Nagarajan, Vaishnavh and Rawat, Ankit Singh and Menon, Aditya Krishna and Kumar, Sanjiv},
  journal   = {Transactions on Machine Learning Research},
  year      = {2024},
  url       = {https://mlanthology.org/tmlr/2024/lukasik2024tmlr-larger/}
}