BitDelta: Your Fine-Tune May Only Be Worth One Bit

Abstract

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it is intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. This interesting finding not only highlights the potential redundancy of information added during fine-tuning, but also has significant implications for the multi-tenant serving and multi-tenant storage of fine-tuned models. By enabling the use of a single high-precision base model accompanied by multiple 1-bit deltas, BitDelta dramatically reduces GPU memory requirements by more than 10x, thus reducing per-user generation latency by more than 10x in multi-tenant settings. We validate BitDelta through experiments across Llama-2, Mistral and MPT model families, and on models up to 70B parameters, showcasing minimal performance degradation in all tested settings.

Cite

Text

Liu et al. "BitDelta: Your Fine-Tune May Only Be Worth One Bit." Neural Information Processing Systems, 2024. doi:10.52202/079017-0434

Markdown

[Liu et al. "BitDelta: Your Fine-Tune May Only Be Worth One Bit." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/liu2024neurips-bitdelta/) doi:10.52202/079017-0434

BibTeX

@inproceedings{liu2024neurips-bitdelta,
  title     = {{BitDelta: Your Fine-Tune May Only Be Worth One Bit}},
  author    = {Liu, James and Xiao, Guangxuan and Li, Kai and Lee, Jason D. and Han, Song and Dao, Tri and Cai, Tianle},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0434},
  url       = {https://mlanthology.org/neurips/2024/liu2024neurips-bitdelta/}
}