CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers

Abstract

Transformers have driven remarkable breakthroughs in natural language processing and computer vision, yet their standard attention mechanism still imposes $O(N^2)$ complexity, hindering scalability to longer sequences. We introduce Circular-convolutional ATtention (CAT), a Fourier-based approach that efficiently applies circular convolutions to reduce complexity without sacrificing representational power. CAT achieves $O(N \log N)$ computations, requires fewer learnable parameters by streamlining fully connected layers, and introduces no heavier operations, resulting in consistent accuracy improvements and about a 10\% speedup in naive PyTorch implementations. Based on the engineering-isomorphic transformer framework, CAT's design not only offers practical efficiency and ease of implementation, but also provides insights to guide the development of future high-performance Transformer architectures. Finally, our ablation studies highlight the key conditions underlying CAT’s success, shedding light on broader principles for scalable attention mechanisms.

Cite

Text

Yamada. "CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers." Advances in Neural Information Processing Systems, 2025.

Markdown

[Yamada. "CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yamada2025neurips-cat/)

BibTeX

@inproceedings{yamada2025neurips-cat,
  title     = {{CAT: Circular-Convolutional Attention for Sub-Quadratic Transformers}},
  author    = {Yamada, Yoshihiro},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/yamada2025neurips-cat/}
}