Transformers in Uniform TC$^0$

Abstract

Previous work has shown that the languages recognized by average-hard attention transformers (AHATs) and softmax-attention transformers (SMATs) are within the circuit complexity class TC$^0$. However, these results assume limited-precision arithmetic: using floating-point numbers with O(log n) bits (where n is the length of the input string), Strobl showed that AHATs can be approximated in L-uniform TC$^0$, and Merrill and Sabharwal showed that SMATs can be approximated in DLOGTIME-uniform TC$^0$. Here, we improve these results, showing that AHATs with no approximation, SMATs with O(poly(n)) bits of floating-point precision, and SMATs with at most $2^{−O(poly(n))}$ absolute error are all in DLOGTIME-uniform TC$^0$.

Cite

Text

Chiang. "Transformers in Uniform TC$^0$." Transactions on Machine Learning Research, 2025.

Markdown

[Chiang. "Transformers in Uniform TC$^0$." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/chiang2025tmlr-transformers/)

BibTeX

@article{chiang2025tmlr-transformers,
  title     = {{Transformers in Uniform TC$^0$}},
  author    = {Chiang, David},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/chiang2025tmlr-transformers/}
}