Accurate LoRA-Finetuning Quantization of LLMs via Information Retention
Abstract
The LoRA-finetuning quantization of LLMs has been extensively studied to obtain accurate yet compact LLMs for deployment on resource-constrained hardware. However, existing methods cause the quantized LLM to severely degrade and even fail to benefit from the finetuning of LoRA. This paper proposes a novel IR-QLoRA for pushing quantized LLMs with LoRA to be highly accurate through information retention. The proposed IR-QLoRA mainly relies on two technologies derived from the perspective of unified information: (1) statistics-based Information Calibration Quantization allows the quantized parameters of LLM to retain original information accurately; (2) finetuning-based Information Elastic Connection makes LoRA utilizes elastic representation transformation with diverse information. Comprehensive experiments show that IR-QLoRA can significantly improve accuracy across LLaMA and LLaMA2 families under 2-4 bit-widths, e.g., 4-bit LLaMA-7B achieves 1.4% improvement on MMLU compared with the state-of-the-art methods. The significant performance gain requires only a tiny 0.31% additional time consumption, revealing the satisfactory efficiency of our IR-QLoRA. We highlight that IR-QLoRA enjoys excellent versatility, compatible with various frameworks (e.g., NormalFloat and Integer quantization) and brings general accuracy gains. The code is available at https://github.com/htqin/ir-qlora .
Cite
Text
Qin et al. "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention." International Conference on Machine Learning, 2024.Markdown
[Qin et al. "Accurate LoRA-Finetuning Quantization of LLMs via Information Retention." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/qin2024icml-accurate/)BibTeX
@inproceedings{qin2024icml-accurate,
title = {{Accurate LoRA-Finetuning Quantization of LLMs via Information Retention}},
author = {Qin, Haotong and Ma, Xudong and Zheng, Xingyu and Li, Xiaoyang and Zhang, Yang and Liu, Shouda and Luo, Jie and Liu, Xianglong and Magno, Michele},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {41498-41516},
volume = {235},
url = {https://mlanthology.org/icml/2024/qin2024icml-accurate/}
}