AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-Wise Pruning of Large Language Models

Abstract

Recent work on pruning large language models (LLMs) has shown that one can eliminate a large number of parameters without compromising performance, making pruning a promising strategy to reduce LLM model size. Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance. In this paper, we leverage Heavy-Tailed Self-Regularization (HT-SR) Theory, in particular the shape of empirical spectral densities (ESDs) of weight matrices, to design improved layerwise pruning ratios for LLMs. Our analysis reveals a wide variability in how well-trained, and thus relatedly how prunable, different layers of an LLM are. Based on this, we propose AlphaPruning, which uses shape metrics to allocate layerwise sparsity ratios in a more theoretically-principled manner. AlphaPruning can be used in conjunction with multiple existing LLM pruning methods. Our empirical results show that AlphaPruning prunes LLaMA-7B to 80% sparsity while maintaining reasonable perplexity, marking a first in the literature on LLMs.

Cite

Text

Lu et al. "AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-Wise Pruning of Large Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-0289

Markdown

[Lu et al. "AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-Wise Pruning of Large Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lu2024neurips-alphapruning/) doi:10.52202/079017-0289

BibTeX

@inproceedings{lu2024neurips-alphapruning,
  title     = {{AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-Wise Pruning of Large Language Models}},
  author    = {Lu, Haiquan and Zhou, Yefan and Liu, Shiwei and Wang, Zhangyang and Mahoney, Michael W. and Yang, Yaoqing},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0289},
  url       = {https://mlanthology.org/neurips/2024/lu2024neurips-alphapruning/}
}