Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks

Tianyu He, Darshil Doshi, Aritra Das, Andrey Gromov

ICMLW 2024

/icmlw/2024/he2024icmlw-learning/

Abstract

Large language models can solve tasks that were not present in the training set. This capability is believed to be due to in-context learning and skill composition. In this work, we study the emergence of in-context learning and skill composition in a collection of modular arithmetic tasks. Specifically, we consider a finite collection of linear modular functions $z = a x + b y \text{ mod } p$ labeled by the vector $(a, b) \in \mathbb{Z}_p^2$. We use some of these tasks for pre-training and the rest for out-of-distribution testing. We empirically show that a GPT-style transformer exhibits a transition from in-distribution to out-of-distribution generalization as the number of pre-training tasks increases. We find that the smallest model capable of out-of-distribution generalization requires two transformer blocks, while for deeper models, the out-of-distribution generalization phase is transient, necessitating early stopping. Finally, we perform an interpretability study of the pre-trained models, revealing the highly structured representations in both phases; and discuss the learnt algorithm.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

He et al. "Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks." ICML 2024 Workshops: MI, 2024.

Markdown

[He et al. "Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks." ICML 2024 Workshops: MI, 2024.](https://mlanthology.org/icmlw/2024/he2024icmlw-learning/)

BibTeX

@inproceedings{he2024icmlw-learning,
  title     = {{Learning to Grok: Emergence of In-Context Learning and Skill Composition in Modular Arithmetic Tasks}},
  author    = {He, Tianyu and Doshi, Darshil and Das, Aritra and Gromov, Andrey},
  booktitle = {ICML 2024 Workshops: MI},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/he2024icmlw-learning/}
}