How Sparse Attention Approximates Exact Attention?Your Attention Is Naturally $n^C$-Sparse

ICLRW 2025

/iclrw/2025/song2025iclrw-sparse/

Abstract

Sparse Attention is a technique that approximates standard attention computation with sub-quadratic complexity. This is achieved by selectively ignoring smaller entries in the attention matrix during the softmax function computation. Variations of this technique, such as pruning KV cache, sparsity-based fast attention, and Sparse Transformer, have been extensively utilized for efficient Large Language Models (LLMs) deployment. Despite its widespread use, a theoretical understanding of the conditions under which sparse attention performs on par with traditional attention remains elusive. This work aims to **bridge this gap by examining the inherent sparsity of standard attention processes**. Our theoretical framework reveals several brand-new key insights: * Attention is $n^{C}$-sparse, implying that considering only the largest $\Omega(n^{C})$ entries out of all $n$ entries is sufficient for sparse attention to approximate the exact attention matrix with decreasing loss. Here, $n$ represents the input length and $C \in (0, 1)$ is a constant. * Stable $o(\log(n))$-sparse attention, which approximates attention computation with $\log(n)$ or fewer entries, may not be feasible since the error will persist at a minimum of $O(1)$. * An adaptive strategy ($\alpha \cdot n^C, \alpha > 0$) for the window size of efficient attention methods rather than a fixed one is guaranteed to perform more accurately and efficiently in a task for inference on flexible context lengths.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Song et al. "How Sparse Attention Approximates Exact Attention?Your Attention Is Naturally $n^C$-Sparse." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Song et al. "How Sparse Attention Approximates Exact Attention?Your Attention Is Naturally $n^C$-Sparse." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/song2025iclrw-sparse/)

BibTeX

@inproceedings{song2025iclrw-sparse,
  title     = {{How Sparse Attention Approximates Exact Attention?Your Attention Is Naturally $n^C$-Sparse}},
  author    = {Song, Zhao and Xiong, Jing and Yang, Chiwun},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/song2025iclrw-sparse/}
}