Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition

He, Zhengfu; Wang, Junxuan; Lin, Rui; Ge, Xuyang; Shu, Wentao; Tang, Qiong; Zhang, Junping; Qiu, Xipeng

Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition

Zhengfu He, Junxuan Wang, Rui Lin, Xuyang Ge, Wentao Shu, Qiong Tang, Junping Zhang, Xipeng Qiu

ICLR 2026

/iclr/2026/he2026iclr-understanding/

Abstract

We propose Low-Rank Sparse Attention (Lorsa), a sparse replacement model of Transformer attention layers to disentangle original Multi Head Self Attention (MHSA) into individually comprehensible components. Lorsa is designed to address the challenge of \textit{attention superposition} to understand attention-mediated interaction between features in different token positions. Lorsa helps find cleaner and finer-grained versions of previously discovered MHSA behaviors like induction heads, successor heads, attention sink, and a comprehensive family of arithmetic-specific Lorsa heads. Interestingly, we identify a novel head type called \emph{subtoken induction heads} that function at character level rather than token level. Automated interpretability analysis indicates that Lorsa achieves parity with SAE in interpretability while Lorsa exhibits superior circuit discovery properties. We also conduct extensive experiments on architectural design ablation, correlation to original MHSA heads and error analysis. Our early attempt to fully sparsify a toy Transformer succeeds to reveal clean global circuits. Eventually, we hope Lorsa would help us greatly understand attention computation and enable full sparsification of model computation along with its MLP counterparts. Lorsa is open-sourced at https://anonymous.4open.science/r/Lorsa-5686/.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

He et al. "Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition." International Conference on Learning Representations, 2026.

Markdown

[He et al. "Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/he2026iclr-understanding/)

BibTeX

@inproceedings{he2026iclr-understanding,
  title     = {{Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition}},
  author    = {He, Zhengfu and Wang, Junxuan and Lin, Rui and Ge, Xuyang and Shu, Wentao and Tang, Qiong and Zhang, Junping and Qiu, Xipeng},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/he2026iclr-understanding/}
}