Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Abstract
Chain-of-thought reasoning and scratchpads have emerged as critical tools for enhancing the computational capabilities of transformers. While theoretical results show that polynomial-length scratchpads can extend transformers’ expressivity from $TC^0$ to $PTIME$, their required length remains poorly understood. Empirical evidence even suggests that transformers need scratchpads even for many problems in $TC^0$, such as Parity or Multiplication, challenging optimistic bounds derived from circuit complexity. In this work, we initiate the study of systematic lower bounds for the number of CoT steps across different algorithmic problems, in the hard-attention regime. We study a variety of algorithmic problems, and provide bounds that are tight up to logarithmic factors. Overall, these results contribute to emerging understanding of the power and limitations of chain-of-thought reasoning.
Cite
Text
Bavandpour et al. "Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers." Proceedings of the 42nd International Conference on Machine Learning, 2025.Markdown
[Bavandpour et al. "Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/bavandpour2025icml-lower/)BibTeX
@inproceedings{bavandpour2025icml-lower,
title = {{Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers}},
author = {Bavandpour, Alireza Amiri and Huang, Xinting and Rofin, Mark and Hahn, Michael},
booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
year = {2025},
pages = {3274-3306},
volume = {267},
url = {https://mlanthology.org/icml/2025/bavandpour2025icml-lower/}
}