Bayes Optimal Learning of Attention-Indexed Models

Abstract

We introduce the attention-indexed model (AIM), a theoretical framework for analyzing learning in deep attention layers. Inspired by multi-index models, AIM captures how token-level outputs emerge from layered bilinear interactions over high-dimensional embeddings. Unlike prior tractable attention models, AIM allows full-width key and query matrices, aligning more closely with practical transformers. Using tools from statistical mechanics and random matrix theory, we derive closed-form predictions for Bayes-optimal generalization error and identify sharp phase transitions as a function of sample complexity, model width, and sequence length. We propose a matching approximate message passing algorithm and show that gradient descent can reach optimal performance. AIM offers a solvable playground for understanding learning in self-attention layers, that are key components of modern architectures.

Cite

Text

Boncoraglio et al. "Bayes Optimal Learning of Attention-Indexed Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Boncoraglio et al. "Bayes Optimal Learning of Attention-Indexed Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/boncoraglio2025neurips-bayes/)

BibTeX

@inproceedings{boncoraglio2025neurips-bayes,
  title     = {{Bayes Optimal Learning of Attention-Indexed Models}},
  author    = {Boncoraglio, Fabrizio and Troiani, Emanuele and Erba, Vittorio and Zdeborova, Lenka},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/boncoraglio2025neurips-bayes/}
}