CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

Yamada, Yoshihiro

CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

NeurIPS 2025

/neurips/2025/yamada2025neurips-cat/

Abstract

Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes $O(N^2)$ complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves $O(N \log N)$ computations, requires fewer learnable parameters by streamlining fully connected layers, and introduces no heavier operations, resulting in consistent accuracy improvements and about a 10\% speedup in naive PyTorch implementations. Based on the engineering-isomorphic transformer framework, CAT's design not only offers practical efficiency and ease of implementation, but also provides insights to guide the development of future high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT’s success, shedding light on broader principles for scalable attention mechanisms.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Yamada. "CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers." Advances in Neural Information Processing Systems, 2025.

Markdown

[Yamada. "CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yamada2025neurips-cat/)

BibTeX

@inproceedings{yamada2025neurips-cat,
  title     = {{CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers}},
  author    = {Yamada, Yoshihiro},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/yamada2025neurips-cat/}
}