Non-Vacuous Generalization Bounds for Large Language Models

Abstract

Modern language models can contain billions of parameters, raising the question of whether they can generalize beyond the training data or simply regurgitate their training corpora. We provide the first non-vacuous generalization bounds for pretrained large language models (LLMs), indicating that language models are capable of discovering regularities that generalize to unseen data. In particular, we derive a compression bound that is valid for the unbounded log-likelihood loss, and we extend the bound to handle subsampling, accelerating bound computation on massive datasets. To achieve the extreme level of compression required for non-vacuous generalization bounds, we devise SubLoRA, a low-dimensional non-linear parameterization. Using this approach, we find that larger models have better generalization bounds and are more compressible than smaller models.

Cite

Text

Lotfi et al. "Non-Vacuous Generalization Bounds for Large Language Models." NeurIPS 2023 Workshops: M3L, 2023.

Markdown

[Lotfi et al. "Non-Vacuous Generalization Bounds for Large Language Models." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/lotfi2023neuripsw-nonvacuous/)

BibTeX

@inproceedings{lotfi2023neuripsw-nonvacuous,
  title     = {{Non-Vacuous Generalization Bounds for Large Language Models}},
  author    = {Lotfi, Sanae and Finzi, Marc and Kuang, Yilun and Rudner, Tim and Goldblum, Micah and Wilson, Andrew},
  booktitle = {NeurIPS 2023 Workshops: M3L},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/lotfi2023neuripsw-nonvacuous/}
}