An Information-Theoretic Analysis of In-Context Learning

Hong Jun Jeon, Jason D. Lee, Qi Lei, Benjamin Van Roy

ICML 2024 pp. 21522-21554

/icml/2024/jeon2024icml-informationtheoretic/

Abstract

Previous theoretical results pertaining to meta-learning on sequences build on contrived and convoluted mixing time assumptions. We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers and corroborate existing results a simple linear setting. Our theoretical results characterize how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Jeon et al. "An Information-Theoretic Analysis of In-Context Learning." International Conference on Machine Learning, 2024.

Markdown

[Jeon et al. "An Information-Theoretic Analysis of In-Context Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jeon2024icml-informationtheoretic/)

BibTeX

@inproceedings{jeon2024icml-informationtheoretic,
  title     = {{An Information-Theoretic Analysis of In-Context Learning}},
  author    = {Jeon, Hong Jun and Lee, Jason D. and Lei, Qi and Van Roy, Benjamin},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21522-21554},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/jeon2024icml-informationtheoretic/}
}