LSA: Layer-Wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error

Abstract

Deploying large language models (LLMs) on platforms with insufficient computational resources remains a key challenge. Weight pruning is an efficient model compression technique that can reduce model size without retraining LLMs. However, due to the massive number of parameters, it is infeasible to estimate the importance of weights globally, and most prior studies assign a uniform sparsity ratio across all layers. Recent findings reveal that layers contribute unevenly to LLM performance, making it necessary to investigate Layer-wise importance. Existing Layer-wise sparsity allocation methods, such as OWL and DLP, rely on weight scoring and carefully designed score proxies to estimate Layer-wise importance and sparsity ratios, while enforcing identical sparsity to blocks and projection weights within a layer to avoid performance degradation. In this work, we propose Layer-wise Sparsity Allocation (LSA) for LLM pruning, which quantifies Layer-wise importance by evaluating the minimal linear reconstruction error (LSE) of each transformer layer under the assumption that 50\% of its least important weights are removed. Moreover, our method supports non-uniform sparsity allocation at block- or projection-level granularity within layers, without incurring catastrophic performance degradation. Experimental results demonstrate that LSA maintains high performance at high sparsity levels. At an overall sparsity ratio of 70\%, LSA surpasses state-of-the-art methods across language modeling tasks and seven zero-shot tasks.

Cite

Text

Yang et al. "LSA: Layer-Wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error." International Conference on Learning Representations, 2026.

Markdown

[Yang et al. "LSA: Layer-Wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-lsa/)

BibTeX

@inproceedings{yang2026iclr-lsa,
  title     = {{LSA: Layer-Wise Sparsity Allocation for Large Language Model Pruning Based on Minimal Linear Reconstruction Error}},
  author    = {Yang, Zhiguo and Deng, Changjian and Chen, Qinke and Zhou, Zijing and Cheng, Jian},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/yang2026iclr-lsa/}
}