LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation

Li, Yixiao; Yu, Yifan; Zhang, Qingru; Liang, Chen; He, Pengcheng; Chen, Weizhu; Zhao, Tuo

LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation

Yixiao Li, Yifan Yu, Qingru Zhang, Chen Liang, Pengcheng He, Weizhu Chen, Tuo Zhao

ICML 2023 pp. 20336-20350

/icml/2023/li2023icml-losparse/

Abstract

Transformer models have achieved remarkable results in various natural language tasks, but they are often prohibitively large, requiring massive memories and computational resources. To re- duce the size and complexity of these models, we propose LoSparse (Low-Rank and Sparse ap- proximation), a novel model compression tech- nique that approximates a weight matrix by the sum of a low-rank matrix and a sparse matrix. Our method combines the advantages of both low- rank approximations and pruning, while avoid- ing their limitations. Low-rank approximation compresses the coherent and expressive parts in neurons, while pruning removes the incoherent and non-expressive parts in neurons. Pruning enhances the diversity of low-rank approxima- tions, and low-rank approximation prevents prun- ing from losing too many expressive neurons. We evaluate our method on natural language under- standing, question answering, and natural lan- guage generation tasks. We show that it signif- icantly outperforms existing compression meth- ods. Our code is publicly available at https: //github.com/yxli2123/LoSparse

PDF ICML OpenReview Semantic Scholar

Cite

Text

Li et al. "LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation." International Conference on Machine Learning, 2023.

Markdown

[Li et al. "LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/li2023icml-losparse/)

BibTeX

@inproceedings{li2023icml-losparse,
  title     = {{LoSparse: Structured Compression of Large Language Models Based on Low-Rank and Sparse Approximation}},
  author    = {Li, Yixiao and Yu, Yifan and Zhang, Qingru and Liang, Chen and He, Pengcheng and Chen, Weizhu and Zhao, Tuo},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {20336-20350},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/li2023icml-losparse/}
}