Soft Prompt Recovers Compressed LLMs, Transferably

ICML 2024 pp. 55186-55203

Abstract

Model compression is one of the most popular approaches to improve the accessibility of Large Language Models (LLMs) by reducing their memory footprint. However, the gaining of such efficiency benefits often simultaneously demands extensive engineering efforts and intricate designs to mitigate the performance decline. In this work, we leverage (Soft) Prompt Tuning in its most vanilla form and discover such conventionally learned soft prompts can recover the performance of compressed LLMs. More surprisingly, we observe such recovery effect to be transferable among different tasks and models (albeit natural tokenizer and dimensionality limitations), resulting in further overhead reduction and yet, subverting the common belief that learned soft prompts are task-specific. Our work is fully orthogonal and compatible with model compression frameworks such as pruning and quantization, where we enable up to $8\times$ compressed LLM (with a joint 4-bit quantization and 50% weight pruning compression) to match its uncompressed counterparts on popular benchmarks. We note that we are the first to reveal vanilla Parameter-Efficient Fine-Tuning (PEFT) techniques have the potential to be utilized under a compression recovery context, opening a new line of opportunities for model accessibility advancement while freeing our fellow researchers from the previously present engineering burdens and constraints. The code is available at https://github.com/zirui-ray-liu/compress-then-prompt.

Cite

Text

Xu et al. "Soft Prompt Recovers Compressed LLMs, Transferably." International Conference on Machine Learning, 2024.

Markdown

[Xu et al. "Soft Prompt Recovers Compressed LLMs, Transferably." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/xu2024icml-soft/)

BibTeX

@inproceedings{xu2024icml-soft,
  title     = {{Soft Prompt Recovers Compressed LLMs, Transferably}},
  author    = {Xu, Zhaozhuo and Liu, Zirui and Chen, Beidi and Zhong, Shaochen and Tang, Yuxin and Wang, Jue and Zhou, Kaixiong and Hu, Xia and Shrivastava, Anshumali},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {55186-55203},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/xu2024icml-soft/}
}