Learning In-Context $n$-Grams with Transformers: Sub-$n$-Grams Are Near-Stationary Points

Abstract

Motivated by empirical observations of prolonged plateaus and stage-wise progression during training, we investigate the loss landscape of transformer models trained on in-context next-token prediction tasks. In particular, we focus on learning in-context $n$-gram language models under cross-entropy loss, and establish a sufficient condition for parameter configurations to be stationary points. We then construct a set of parameter configurations for a simplified transformer model that represent $k$-gram estimators (for $k \leq n$), and show that the gradient of the population loss at these solutions vanishes in the limit of infinite sequence length and parameter norm. This reveals a key property of the loss landscape: sub-$n$-grams are near-stationary points of the population cross-entropy loss, offering theoretical insight into widely observed phenomena such as stage-wise learning dynamics and emergent phase transitions. These insights are further supported by numerical experiments that illustrate the learning dynamics of $n$-grams, characterized by discrete transitions between near-stationary solutions.

Cite

Text

Varre et al. "Learning In-Context $n$-Grams with Transformers: Sub-$n$-Grams Are Near-Stationary Points." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Varre et al. "Learning In-Context $n$-Grams with Transformers: Sub-$n$-Grams Are Near-Stationary Points." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/varre2025icml-learning/)

BibTeX

@inproceedings{varre2025icml-learning,
  title     = {{Learning In-Context $n$-Grams with Transformers: Sub-$n$-Grams Are Near-Stationary Points}},
  author    = {Varre, Aditya and Yüce, Gizem and Flammarion, Nicolas},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {60924-60963},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/varre2025icml-learning/}
}