FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models

Abstract

Pre-trained foundation models, particularly large language models, have achieved remarkable success and led to massive fine-tuned variants. These models are commonly fine-tuned locally and then uploaded by users to cloud platforms such as HuggingFace for secure storage. However, the huge model number and their billion-level parameters impose heavy storage overhead for cloud with limited resources. Our empirical and theoretical analysis reveals that most fine-tuned models in cloud have a small difference (delta) from their pre-trained models. To this end, we propose a novel lossless compression scheme FM-Delta specifically for storing massive fine-tuned models in cloud. FM-Delta maps fine-tuned and pre-trained model parameters into integers with the same bits, and entropy codes their integer delta. In this way, cloud only needs to store one uncompressed pre-trained model and other compressed fine-tuned models. Extensive experiments have demonstrated that FM-Delta efficiently reduces cloud storage consumption for massive fine-tuned models by an average of around 50% with only negligible additional time in most end-to-end cases. For example, on up to 10 fine-tuned models in the GPT-NeoX-20B family, FM-Delta reduces the original storage requirement from 423GB to 205GB, significantly saving cloud storage costs.

Cite

Text

Ning et al. "FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2134

Markdown

[Ning et al. "FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ning2024neurips-fmdelta/) doi:10.52202/079017-2134

BibTeX

@inproceedings{ning2024neurips-fmdelta,
  title     = {{FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models}},
  author    = {Ning, Wanyi and Wang, Jingyu and Qi, Qi and Zhu, Mengde and Sun, Haifeng and Cheng, Daixuan and Liao, Jianxin and Zhang, Ce},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2134},
  url       = {https://mlanthology.org/neurips/2024/ning2024neurips-fmdelta/}
}