PC-LoRA: Progressive Model Compression with Low Rank Adaptation

Abstract

This work presents Progressive Compression LoRA (PC-LoRA), a novel extension of Low-Rank Adaptation (LoRA), designed to enable model compression and parameter-efficient fine-tuning. To mitigate the computational costs of large-scale models, PC-LoRA introduces a approach of decaying model weights to zero. This method allows to model compression and efficient fine-tuning by progressively reducing the pre-trained weights during the fine-tuning process until they are completely removed. Through empirical analysis on various models, we demonstrate that PC-LoRA significantly reduces computational costs with minor performance degradation. Compared to full fine-tuning and LoRA fine-tuning, PC-LoRA shows an average performance drop of -3.085%. Despite this, our method substantially compresses models, by 94.1% / 89.1% in parameters and FLOPs for vision models, and achieves a 93.5% parameter and 84.2% Flops reduction for NLP models.

Cite

Text

Hwang et al. "PC-LoRA: Progressive Model Compression with Low Rank Adaptation." ICLR 2024 Workshops: PML4LRS, 2024.

Markdown

[Hwang et al. "PC-LoRA: Progressive Model Compression with Low Rank Adaptation." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/hwang2024iclrw-pclora/)

BibTeX

@inproceedings{hwang2024iclrw-pclora,
  title     = {{PC-LoRA: Progressive Model Compression with Low Rank Adaptation}},
  author    = {Hwang, Injoon and Park, HaeWon and Yang, Jooyoung and Maeng, SunJae and Lee, Youngwan},
  booktitle = {ICLR 2024 Workshops: PML4LRS},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/hwang2024iclrw-pclora/}
}