Forget the Data and Fine-Tuning! Just Fold the Network to Compress

Abstract

We introduce model folding, a novel data-free model compression technique that merges structurally similar neurons across layers, significantly reducing the model size without the need for fine-tuning or access to training data. Unlike existing methods, model folding preserves data statistics during compression by leveraging k-means clustering, and using novel data-free techniques to prevent variance collapse or explosion. Our theoretical framework and experiments across standard benchmarks, including ResNet18 and LLaMA-7B, demonstrate that model folding achieves comparable performance to data-driven compression techniques and outperforms recently proposed data-free methods, especially at high sparsity levels. This approach is particularly effective for compressing large-scale models, making it suitable for deployment in resource-constrained environments.

Cite

Text

Wang et al. "Forget the Data and Fine-Tuning! Just Fold the Network to Compress." International Conference on Learning Representations, 2025.

Markdown

[Wang et al. "Forget the Data and Fine-Tuning! Just Fold the Network to Compress." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-forget/)

BibTeX

@inproceedings{wang2025iclr-forget,
  title     = {{Forget the Data and Fine-Tuning! Just Fold the Network to Compress}},
  author    = {Wang, Dong and Šikić, Haris and Thiele, Lothar and Saukh, Olga},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/wang2025iclr-forget/}
}