Position: Understanding LLMs Requires More than Statistical Generalization

Patrik Reizinger, Szilvia Ujváry, Anna Mészáros, Anna Kerekes, Wieland Brendel, Ferenc Huszár

ICML 2024 pp. 42365-42390

/icml/2024/reizinger2024icml-position/

Abstract

The last decade has seen blossoming research in deep learning theory attempting to answer, “Why does deep learning generalize?" A powerful shift in perspective precipitated this progress: the study of overparametrized models in the interpolation regime. In this paper, we argue that another perspective shift is due, since some of the desirable qualities of LLMs are not a consequence of good statistical generalization and require a separate theoretical explanation. Our core argument relies on the observation that AR probabilistic models are inherently non-identifiable: models zero or near-zero KL divergence apart—thus, equivalent test loss—can exhibit markedly different behaviors. We support our position with mathematical examples and empirical observations, illustrating why non-identifiability has practical relevance through three case studies: (1) the non-identifiability of zero-shot rule extrapolation; (2) the approximate non-identifiability of in-context learning; and (3) the non-identifiability of fine-tunability. We review promising research directions focusing on LLM-relevant generalization measures, transferability, and inductive biases.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Reizinger et al. "Position: Understanding LLMs Requires More than Statistical Generalization." International Conference on Machine Learning, 2024.

Markdown

[Reizinger et al. "Position: Understanding LLMs Requires More than Statistical Generalization." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/reizinger2024icml-position/)

BibTeX

@inproceedings{reizinger2024icml-position,
  title     = {{Position: Understanding LLMs Requires More than Statistical Generalization}},
  author    = {Reizinger, Patrik and Ujváry, Szilvia and Mészáros, Anna and Kerekes, Anna and Brendel, Wieland and Huszár, Ferenc},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {42365-42390},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/reizinger2024icml-position/}
}