PC-LoRA: Progressive Model Compression with Low Rank Adaptation
Abstract
This work presents Progressive Compression LoRA (PC-LoRA), a novel extension of Low-Rank Adaptation (LoRA), designed to enable model compression and parameter-efficient fine-tuning. To mitigate the computational costs of large-scale models, PC-LoRA introduces a approach of decaying model weights to zero. This method allows to model compression and efficient fine-tuning by progressively reducing the pre-trained weights during the fine-tuning process until they are completely removed. Through empirical analysis on various models, we demonstrate that PC-LoRA significantly reduces computational costs with minor performance degradation. Compared to full fine-tuning and LoRA fine-tuning, PC-LoRA shows an average performance drop of -3.085%. Despite this, our method substantially compresses models, by 94.1% / 89.1% in parameters and FLOPs for vision models, and achieves a 93.5% parameter and 84.2% Flops reduction for NLP models.
Cite
Text
Hwang et al. "PC-LoRA: Progressive Model Compression with Low Rank Adaptation." ICLR 2024 Workshops: PML4LRS, 2024.Markdown
[Hwang et al. "PC-LoRA: Progressive Model Compression with Low Rank Adaptation." ICLR 2024 Workshops: PML4LRS, 2024.](https://mlanthology.org/iclrw/2024/hwang2024iclrw-pclora/)BibTeX
@inproceedings{hwang2024iclrw-pclora,
title = {{PC-LoRA: Progressive Model Compression with Low Rank Adaptation}},
author = {Hwang, Injoon and Park, HaeWon and Yang, Jooyoung and Maeng, SunJae and Lee, Youngwan},
booktitle = {ICLR 2024 Workshops: PML4LRS},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/hwang2024iclrw-pclora/}
}