BitDelta: Your Fine-Tune May Only Be Worth One Bit

James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

NeurIPS 2024

doi:10.52202/079017-0434 /neurips/2024/liu2024neurips-bitdelta/

Abstract

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks. Given the higher computational demand of pre-training, it is intuitive to assume that fine-tuning adds less new information to the model, and is thus more compressible. We explore this assumption by decomposing the weights of fine-tuned models into their pre-trained components and an additional delta. We introduce a simple method, BitDelta, which successfully quantizes this delta down to 1 bit without compromising performance. This interesting finding not only highlights the potential redundancy of information added during fine-tuning, but also has significant implications for the multi-tenant serving and multi-tenant storage of fine-tuned models. By enabling the use of a single high-precision base model accompanied by multiple 1-bit deltas, BitDelta dramatically reduces GPU memory requirements by more than 10x, thus reducing per-user generation latency by more than 10x in multi-tenant settings. We validate BitDelta through experiments across Llama-2, Mistral and MPT model families, and on models up to 70B parameters, showcasing minimal performance degradation in all tested settings.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Liu et al. "BitDelta: Your Fine-Tune May Only Be Worth One Bit." Neural Information Processing Systems, 2024. doi:10.52202/079017-0434

Markdown

[Liu et al. "BitDelta: Your Fine-Tune May Only Be Worth One Bit." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/liu2024neurips-bitdelta/) doi:10.52202/079017-0434

BibTeX

@inproceedings{liu2024neurips-bitdelta,
  title     = {{BitDelta: Your Fine-Tune May Only Be Worth One Bit}},
  author    = {Liu, James and Xiao, Guangxuan and Li, Kai and Lee, Jason D. and Han, Song and Dao, Tri and Cai, Tianle},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0434},
  url       = {https://mlanthology.org/neurips/2024/liu2024neurips-bitdelta/}
}