SoftMax Transformers Are Turing-Complete

Abstract

Hard attention Chain-of-Thought (CoT) transformers are known to be Turing-complete. However, it is an open problem whether softmax attention Chain-of-Thought (CoT) transformers are Turing-complete. In this paper, we prove a stronger result that length-generalizable softmax CoT transformers are Turing-complete. More precisely, our Turing-completeness proof goes via the CoT extension of the Counting RASP (C-RASP), which correspond to softmax CoT transformers that admit length generalization. We prove Turing-completeness for CoT C-RASP with causal masking over a unary alphabet (more generally, for the letter-bounded languages). While we show that this is actually not Turing-complete for arbitrary languages, we prove that its extension with relative positional encoding is Turing-complete for arbitrary languages. We empirically validate our theoretical results by training transformers for various languages that require complex (non-linear) arithmetic reasoning.

Cite

Text

Jiang et al. "SoftMax Transformers Are Turing-Complete." International Conference on Learning Representations, 2026.

Markdown

[Jiang et al. "SoftMax Transformers Are Turing-Complete." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jiang2026iclr-softmax/)

BibTeX

@inproceedings{jiang2026iclr-softmax,
  title     = {{SoftMax Transformers Are Turing-Complete}},
  author    = {Jiang, Hongjian and Hahn, Michael and Zetzsche, Georg and Lin, Anthony Widjaja},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/jiang2026iclr-softmax/}
}