Scatterbrain: Unifying Sparse and Low-Rank Attention

Abstract

Recent advances in efficient Transformers have exploited either the sparsity or low-rank properties of attention matrices to reduce the computational and memory bottlenecks of modeling long sequences. However, it is still challenging to balance the trade-off between model quality and efficiency to perform a one-size-fits-all approximation for different tasks. To better understand this trade-off, we observe that sparse and low-rank approximations excel in different regimes, determined by the softmax temperature in attention, and sparse + low-rank can outperform each individually. Inspired by the classical robust-PCA algorithm for sparse and low-rank decomposition, we propose Scatterbrain, a novel way to unify sparse (via locality sensitive hashing) and low-rank (via kernel feature map) attention for accurate and efficient approximation. The estimation is unbiased with provably low error. We empirically show that Scatterbrain can achieve $2.1 \times$ lower error than baselines when serving as a drop-in replacement in BigGAN image generation and pre-trained T2T-ViT. On a pre-trained T2T Vision transformer, even without fine-tuning, Scatterbrain can reduce $98\%$ of attention memory at the cost of only $1\%$ drop in accuracy. We demonstrate Scatterbrain for end-to-end training with up to $4$ points better perplexity and 5 points better average accuracy than sparse or low-rank efficient transformers on language modeling and long-range-arena tasks.

Cite

Text

Chen et al. "Scatterbrain: Unifying Sparse and Low-Rank Attention." Neural Information Processing Systems, 2021.

Markdown

[Chen et al. "Scatterbrain: Unifying Sparse and Low-Rank Attention." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/chen2021neurips-scatterbrain/)

BibTeX

@inproceedings{chen2021neurips-scatterbrain,
  title     = {{Scatterbrain: Unifying Sparse and Low-Rank Attention}},
  author    = {Chen, Beidi and Dao, Tri and Winsor, Eric and Song, Zhao and Rudra, Atri and Ré, Christopher},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/chen2021neurips-scatterbrain/}
}