An Information-Theoretic Analysis of In-Context Learning

Abstract

Previous theoretical results pertaining to meta-learning on sequences build on contrived and convoluted mixing time assumptions. We introduce new information-theoretic tools that lead to a concise yet general decomposition of error for a Bayes optimal predictor into two components: meta-learning error and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers and corroborate existing results a simple linear setting. Our theoretical results characterize how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.

Cite

Text

Jeon et al. "An Information-Theoretic Analysis of In-Context Learning." International Conference on Machine Learning, 2024.

Markdown

[Jeon et al. "An Information-Theoretic Analysis of In-Context Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/jeon2024icml-informationtheoretic/)

BibTeX

@inproceedings{jeon2024icml-informationtheoretic,
  title     = {{An Information-Theoretic Analysis of In-Context Learning}},
  author    = {Jeon, Hong Jun and Lee, Jason D. and Lei, Qi and Van Roy, Benjamin},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {21522-21554},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/jeon2024icml-informationtheoretic/}
}