LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs Under 2 Bits

Abstract

Fine-tuning large language models (LLMs) is increasingly costly as models scale to hundreds of billions of parameters, and even parameter-efficient fine-tuning (PEFT) methods like LoRA remain resource-intensive. We introduce LowRA, the first framework to enable LoRA fine-tuning below 2 bits per parameter with minimal performance loss. LowRA optimizes fine-grained quantization—mapping, threshold selection, and precision assignment—while leveraging efficient CUDA kernels for scalable deployment. Extensive evaluations across 4 LLMs and 4 datasets show that LowRA achieves a superior performance–precision trade-off above 2 bits and remains accurate down to 1.15 bits, reducing memory usage by up to 50%. Our results highlight the potential of ultra-low-bit LoRA fine-tuning for resource-constrained environments.

Cite

Text

Zhou et al. "LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs Under 2 Bits." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Zhou et al. "LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs Under 2 Bits." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/zhou2025icml-lowra/)

BibTeX

@inproceedings{zhou2025icml-lowra,
  title     = {{LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs Under 2 Bits}},
  author    = {Zhou, Zikai and Zhang, Qizheng and Kumbong, Hermann and Olukotun, Kunle},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {79570-79594},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/zhou2025icml-lowra/}
}