Fast and Effective Weight Update for Pruned Large Language Models
Abstract
Pruning large language models (LLMs) is a challenging task due to their enormous size. The primary difficulty is fine-tuning the model after pruning, which is needed to recover the lost performance caused by dropping weights. Recent approaches have either ignored fine-tuning entirely, focusing on efficient pruning criteria, or attempted layer-wise weight updates, preserving the behavior of each layer. However, even layer-wise weight updates can be costly for LLMs, and previous works have resorted to various approximations. In our paper, we propose a fast and effective weight update algorithm for pruned layers based on the Alternating Direction Method of Multipliers (ADMM). We further extend it with a simple gradual pruning mask selection and achieve state-of-the-art pruning performance across a wide range of LLMs.
Cite
Text
Boža. "Fast and Effective Weight Update for Pruned Large Language Models." Transactions on Machine Learning Research, 2024.Markdown
[Boža. "Fast and Effective Weight Update for Pruned Large Language Models." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/boza2024tmlr-fast/)BibTeX
@article{boza2024tmlr-fast,
title = {{Fast and Effective Weight Update for Pruned Large Language Models}},
author = {Boža, Vladimír},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/boza2024tmlr-fast/}
}