Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection
Abstract
This work advances the understandings of the remarkable \emph{in-context learning} (ICL) abilities of transformers---the ability of performing new tasks when prompted with training and test examples, without any parameter update to the model. We begin by showing that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, convex risk minimization for generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Our transformer constructions admit mild bounds on the number of layers and heads, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life---A \emph{single} transformer can adaptively select different base ICL algorithms---or even perform qualitatively different tasks---on different input sequences, without any explicit prompting of the right algorithm or task. In theory, we construct two general mechanisms for algorithm selection with concrete examples: (1) Pre-ICL testing, where the transformer determines the right task for the given sequenceby examining certain summary statistics of the input sequence; (2) Post-ICL validation, where the transformer selects---among multiple base ICL algorithms---a near-optimal one for the given sequence using a train-validation split. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
Cite
Text
Bai et al. "Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection." ICML 2023 Workshops: ES-FoMO, 2023.Markdown
[Bai et al. "Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/bai2023icmlw-transformers/)BibTeX
@inproceedings{bai2023icmlw-transformers,
title = {{Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection}},
author = {Bai, Yu and Chen, Fan and Wang, Huan and Xiong, Caiming and Mei, Song},
booktitle = {ICML 2023 Workshops: ES-FoMO},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/bai2023icmlw-transformers/}
}