Probability Distributions Computed by Autoregressive Transformers

Yang, Andy; Svete, Anej; Li, Jiaoda; Lin, Anthony Widjaja; Rawski, Jonathan; Cotterell, Ryan; Chiang, David

Probability Distributions Computed by Autoregressive Transformers

Andy Yang, Anej Svete, Jiaoda Li, Anthony Widjaja Lin, Jonathan Rawski, Ryan Cotterell, David Chiang

ICLR 2026

/iclr/2026/yang2026iclr-probability/

Abstract

Most expressivity results for transformers treat them as language recognizers—devices that accept or reject strings—rather than as they are used in practice: as language models that generate strings autoregressively and probabilistically. We characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing in their most common use case as language models.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Yang et al. "Probability Distributions Computed by Autoregressive Transformers." International Conference on Learning Representations, 2026.

Markdown

[Yang et al. "Probability Distributions Computed by Autoregressive Transformers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-probability/)

BibTeX

@inproceedings{yang2026iclr-probability,
  title     = {{Probability Distributions Computed by Autoregressive Transformers}},
  author    = {Yang, Andy and Svete, Anej and Li, Jiaoda and Lin, Anthony Widjaja and Rawski, Jonathan and Cotterell, Ryan and Chiang, David},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/yang2026iclr-probability/}
}