Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning

Abstract

Transformer-based large language models have displayed impressive capabilities in the domain of in-context learning, wherein they use multiple input-output pairs to make predictions on unlabeled test data. To lay the theoretical groundwork for in-context learning, we delve into the optimization and generalization of a single-head, one-layer Transformer in the context of multi-task learning for classification. Our investigation uncovers that lower sample complexity is associated with increased training-relevant features and reduced noise in prompts, resulting in improved learning performance. The trained model exhibits the mechanism to first attend to demonstrations of training-relevant features and then decode the corresponding label embedding. Furthermore, we delineate the necessary conditions for successful out-of-domain generalization for in-context learning, specifically regarding the relationship between training and testing prompts.

Cite

Text

Li et al. "Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning." NeurIPS 2023 Workshops: M3L, 2023.

Markdown

[Li et al. "Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning." NeurIPS 2023 Workshops: M3L, 2023.](https://mlanthology.org/neuripsw/2023/li2023neuripsw-transformers/)

BibTeX

@inproceedings{li2023neuripsw-transformers,
  title     = {{Transformers as Multi-Task Feature Selectors: Generalization Analysis of In-Context Learning}},
  author    = {Li, Hongkang and Wang, Meng and Lu, Songtao and Wan, Hui and Cui, Xiaodong and Chen, Pin-Yu},
  booktitle = {NeurIPS 2023 Workshops: M3L},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/li2023neuripsw-transformers/}
}