FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models
Abstract
Pre-trained foundation models, particularly large language models, have achieved remarkable success and led to massive fine-tuned variants. These models are commonly fine-tuned locally and then uploaded by users to cloud platforms such as HuggingFace for secure storage. However, the huge model number and their billion-level parameters impose heavy storage overhead for cloud with limited resources. Our empirical and theoretical analysis reveals that most fine-tuned models in cloud have a small difference (delta) from their pre-trained models. To this end, we propose a novel lossless compression scheme FM-Delta specifically for storing massive fine-tuned models in cloud. FM-Delta maps fine-tuned and pre-trained model parameters into integers with the same bits, and entropy codes their integer delta. In this way, cloud only needs to store one uncompressed pre-trained model and other compressed fine-tuned models. Extensive experiments have demonstrated that FM-Delta efficiently reduces cloud storage consumption for massive fine-tuned models by an average of around 50% with only negligible additional time in most end-to-end cases. For example, on up to 10 fine-tuned models in the GPT-NeoX-20B family, FM-Delta reduces the original storage requirement from 423GB to 205GB, significantly saving cloud storage costs.
Cite
Text
Ning et al. "FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2134Markdown
[Ning et al. "FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ning2024neurips-fmdelta/) doi:10.52202/079017-2134BibTeX
@inproceedings{ning2024neurips-fmdelta,
title = {{FM-Delta: Lossless Compression for Storing Massive Fine-Tuned Foundation Models}},
author = {Ning, Wanyi and Wang, Jingyu and Qi, Qi and Zhu, Mengde and Sun, Haifeng and Cheng, Daixuan and Liao, Jianxin and Zhang, Ce},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-2134},
url = {https://mlanthology.org/neurips/2024/ning2024neurips-fmdelta/}
}