Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs

Lin Song, Yukang Chen, Shuai Yang, Xiaohan Ding, Yixiao Ge, Ying-Cong Chen, Ying Shan

CVPR 2024 pp. 13763-13773

doi:10.1109/CVPR52733.2024.01306 /cvpr/2024/song2024cvpr-lowrank/

Abstract

This paper focuses on the high computational complexity in Large Language Models (LLMs) a significant challenge in both natural language processing (NLP) and multi-modal tasks. We propose Low-Rank Approximation for Sparse At- tention (LoRA-Sparse) an innovative approach that strate- gically reduces this complexity. LoRA-Sparse introduces low-rank linear projection layers for sparse attention ap- proximation. It utilizes an order-mimic training methodol- ogy which is crucial for efficiently approximating the self- attention mechanism in LLMs. We empirically show that sparse attention not only reduces computational demands but also enhances model performance in both NLP and multi-modal tasks. This surprisingly shows that redundant attention in LLMs might be non-beneficial. We extensively validate LoRA-Sparse through rigorous empirical studies in both (NLP) and multi-modal tasks demonstrating its effec- tiveness and general applicability. Based on LLaMA and LLaVA models our methods can reduce more than half of the self-attention computation with even better performance than full-attention baselines.

PDF CVPR Semantic Scholar

Cite

Text

Song et al. "Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01306

Markdown

[Song et al. "Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/song2024cvpr-lowrank/) doi:10.1109/CVPR52733.2024.01306

BibTeX

@inproceedings{song2024cvpr-lowrank,
  title     = {{Low-Rank Approximation for Sparse Attention in Multi-Modal LLMs}},
  author    = {Song, Lin and Chen, Yukang and Yang, Shuai and Ding, Xiaohan and Ge, Yixiao and Chen, Ying-Cong and Shan, Ying},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {13763-13773},
  doi       = {10.1109/CVPR52733.2024.01306},
  url       = {https://mlanthology.org/cvpr/2024/song2024cvpr-lowrank/}
}