A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention

Abstract

A theoretical understanding of how algorithmic abilities emerge in the learning of language models remains elusive. In this work, we provide a tight theoretical analysis of the emergence of semantic attention in a solvable model of dot-product attention and consider a non-linear self-attention layer with trainable tied and low-rank query and key matrices. In the asymptotic limit of high-dimensional data and a comparably large number of training samples we provide a tight closed-form characterization of the global minimum of the non-convex empirical loss landscape. We show that this minimum corresponds to either a positional attention mechanism (with tokens attending to each other based on their respective positions) or a semantic attention mechanism (with tokens attending to each other based on their meaning), and evidence an emergent phase transition from the former to the latter with increasing sample complexity.

Cite

Text

Cui et al. "A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention." ICML 2024 Workshops: HiLD, 2024.

Markdown

[Cui et al. "A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention." ICML 2024 Workshops: HiLD, 2024.](https://mlanthology.org/icmlw/2024/cui2024icmlw-phase/)

BibTeX

@inproceedings{cui2024icmlw-phase,
  title     = {{A Phase Transition Between Positional and Semantic Learning in a Solvable Model of Dot-Product Attention}},
  author    = {Cui, Hugo and Behrens, Freya and Krzakala, Florent and Zdeborova, Lenka},
  booktitle = {ICML 2024 Workshops: HiLD},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/cui2024icmlw-phase/}
}