Sublinear Time Quantum Algorithm for Attention Approximation

Abstract

Given the query, key and value matrices $Q, K, V\in \mathbb{R}^{n\times d}$, the attention matrix is defined as $\mathrm{Att}(Q, K, V)=D^{-1}AV$ where $A=\exp(QK^\top/\sqrt{d})$ with $\exp(\cdot)$ applied entrywise, $D=\mathrm{diag}(A{\bf 1}_n)$. The attention matrix is the backbone of modern transformers and large language models, but explicitly forming the softmax matrix $D^{-1}A$ incurs $\Omega(n^2)$, motivating numerous approximation schemes that reduce runtime to $\widetilde O(nd)$ via sparsity or low-rank factorization. We propose a quantum data structure that approximates any row of $\mathrm{Att}(Q, K, V)$ using only row queries to $Q, K, V$. Our algorithm preprocesses these matrices in $\widetilde{O}\left( \epsilon^{-1} n^{0.5} \left( s_\lambda^{2.5} + s_\lambda^{1.5} d + \alpha^{0.5} d \right) \right)$ time, where $\epsilon$ is the target accuracy, $s_\lambda$ is the $\lambda$-statistical dimension of the exponential kernel defined by $Q$ and $K$, and $\alpha$ measures the row distortion of $V$ that is at most $d/{\rm srank}(V)$, the stable rank of $V$. Each row query can be answered in $\widetilde{O}(s_\lambda^2 + s_\lambda d)$ time. To our knowledge, this is the first quantum data structure that approximates rows of the attention matrix in sublinear time with respect to $n$. Our approach relies on a quantum Nystr{\"o}m approximation of the exponential kernel, quantum multivariate mean estimation for computing $D$, and quantum leverage score sampling for the multiplication with $V$.

Cite

Text

Song et al. "Sublinear Time Quantum Algorithm for Attention Approximation." International Conference on Learning Representations, 2026.

Markdown

[Song et al. "Sublinear Time Quantum Algorithm for Attention Approximation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/song2026iclr-sublinear/)

BibTeX

@inproceedings{song2026iclr-sublinear,
  title     = {{Sublinear Time Quantum Algorithm for Attention Approximation}},
  author    = {Song, Zhao and Xue, Jianfei and Zhang, Jiahao and Zhang, Lichen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/song2026iclr-sublinear/}
}