ReALLM: A General Framework for LLM Compression and Fine-Tuning

Lisa Bedin, Louis Leconte, Van Minh Nguyen, Eric Moulines

ICLRW 2025

/iclrw/2025/bedin2025iclrw-reallm/

Abstract

We introduce ReALLM, a novel approach for compression and memory-efficient adaptation of pre-trained language models that encompasses most of the post-training quantization and fine-tuning methods for a budget of $<4$ bits. Pre-trained matrices are decomposed into a high-precision low-rank component and a vector-quantized latent representation (using an autoencoder). During the fine-tuning step, only the low-rank components are updated. Our results show that pre-trained matrices exhibit different patterns. ReALLM adapts the shape of the encoder (small/large embedding, high/low bit VQ, etc.) to each matrix. ReALLM proposes to represent each matrix with a small embedding on $b$ bits and a neural decoder model $D_{\phi}$ with its weights on $b_\phi$ bits. The decompression of a matrix requires only one embedding and a single forward pass with the decoder. Our weight-only quantization algorithm yields the best results on both commonsense reasoning tasks (C4, WikiText-2) for a budget of $3$ bits *without* any training. With a budget of $2$ bits, ReALLM achieves state-of-the-art performance on understanding tasks (ARC, PiQA, Winogrande, MMLU) as well as generation tasks (TruthfulQA) after fine-tuning on a single partition of C4 dataset. Additionally, ReALLM is practical in terms of inference latency and memory.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Bedin et al. "ReALLM: A General Framework for LLM Compression and Fine-Tuning." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Bedin et al. "ReALLM: A General Framework for LLM Compression and Fine-Tuning." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/bedin2025iclrw-reallm/)

BibTeX

@inproceedings{bedin2025iclrw-reallm,
  title     = {{ReALLM: A General Framework for LLM Compression and Fine-Tuning}},
  author    = {Bedin, Lisa and Leconte, Louis and Nguyen, Van Minh and Moulines, Eric},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/bedin2025iclrw-reallm/}
}