Spectral Conditioning of Attention Improves Transformer Performance

Abstract

We present a theoretical analysis of the Jacobian of a attention block within a transformer, showing that it is governed by the query, key, and value projections that define the attention mechanism. Leveraging this insight, we introduce a method that systematically alters the spectral properties of each attention layer to reduce the Jacobian’s condition number, thereby improving the overall conditioning of the attention layers within a transformer network. We empirically show that this improved Jacobian conditioning translates to enhanced performance in practice. Our approach is simple, broadly applicable, and can be easily integrated as a drop-in replacement for a wide range of existing attention mechanisms. We validate its effectiveness across diverse transformer architectures and tasks, demonstrating consistent improvements in performance.

Cite

Text

Saratchandran and Lucey. "Spectral Conditioning of Attention Improves Transformer Performance." Advances in Neural Information Processing Systems, 2025.

Markdown

[Saratchandran and Lucey. "Spectral Conditioning of Attention Improves Transformer Performance." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/saratchandran2025neurips-spectral/)

BibTeX

@inproceedings{saratchandran2025neurips-spectral,
  title     = {{Spectral Conditioning of Attention Improves Transformer Performance}},
  author    = {Saratchandran, Hemanth and Lucey, Simon},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/saratchandran2025neurips-spectral/}
}