A Unified Framework for Sparse Plus Low-Rank Matrix Decomposition for LLMs

Abstract

The impressive capabilities of large foundation models come at a cost of substantial computing resources to serve them. Compressing these pre-trained models is of practical interest as it can democratize deploying them to the machine learning community at large by lowering the costs associated with inference. A promising compression scheme is to decompose foundation models’ dense weights into a sum of sparse plus low-rank matrices. In this paper, we design a unified framework coined $\texttt{HASSLE-free}$ for (semi-structured) sparse plus low-rank matrix decomposition of foundation models. Our framework introduces the local layer-wise reconstruction error objective for this decomposition, we demonstrate that prior work solves a relaxation of this optimization problem; and we provide efficient and scalable methods to minimize the $\textit{exact}$ introduced optimization problem. $\texttt{HASSLE-free}$ substantially outperforms state-of-the-art methods in terms of the introduced objective and a wide range of LLM evaluation benchmarks. For the Llama3-8B model with a 2:4 sparsity component plus a 64-rank component decomposition, a compression scheme for which recent work shows important inference acceleration on GPUs, $\texttt{HASSLE-free}$ reduces the test perplexity by $18$% for the WikiText-2 dataset and reduces the gap (compared to the dense model) of the average of eight popular zero-shot tasks by $28$% compared to existing methods.

Cite

Text

Makni et al. "A Unified Framework for Sparse Plus Low-Rank Matrix Decomposition for LLMs." Conference on Parsimony and Learning, 2025.

Markdown

[Makni et al. "A Unified Framework for Sparse Plus Low-Rank Matrix Decomposition for LLMs." Conference on Parsimony and Learning, 2025.](https://mlanthology.org/cpal/2025/makni2025cpal-unified/)

BibTeX

@inproceedings{makni2025cpal-unified,
  title     = {{A Unified Framework for Sparse Plus Low-Rank Matrix Decomposition for LLMs}},
  author    = {Makni, Mehdi and Behdin, Kayhan and Xu, Zheng and Ponomareva, Natalia and Mazumder, Rahul},
  booktitle = {Conference on Parsimony and Learning},
  year      = {2025},
  pages     = {484-499},
  volume    = {280},
  url       = {https://mlanthology.org/cpal/2025/makni2025cpal-unified/}
}