Accurate Retraining-Free Pruning for Pretrained Encoder-Based Language Models

Abstract

Given a pretrained encoder-based language model, how can we accurately compress it without retraining? Retraining-free structured pruning algorithms are crucial in pretrained language model compression due to their significantly reduced pruning cost and capability to prune large language models. However, existing retraining-free algorithms encounter severe accuracy degradation, as they fail to handle pruning errors, especially at high compression rates. In this paper, we propose KPrune (Knowledge-preserving pruning), an accurate retraining-free structured pruning algorithm for pretrained encoder-based language models. KPrune focuses on preserving the useful knowledge of the pretrained model to minimize pruning errors through a carefully designed iterative pruning process composed of knowledge measurement, knowledge-preserving mask search, and knowledge-preserving weight-tuning. As a result, KPrune shows significant accuracy improvements up to 58.02%p higher F1 score compared to existing retraining-free pruning algorithms under a high compression rate of 80% on the SQuAD benchmark without any retraining process.

Cite

Text

Park et al. "Accurate Retraining-Free Pruning for Pretrained Encoder-Based Language Models." International Conference on Learning Representations, 2024.

Markdown

[Park et al. "Accurate Retraining-Free Pruning for Pretrained Encoder-Based Language Models." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/park2024iclr-accurate/)

BibTeX

@inproceedings{park2024iclr-accurate,
  title     = {{Accurate Retraining-Free Pruning for Pretrained Encoder-Based Language Models}},
  author    = {Park, Seungcheol and Choi, Hojun and Kang, U},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/park2024iclr-accurate/}
}