Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation
Abstract
The adoption of Foundation Models in resource-constrained environments remains challenging due to their large size and inference costs. A promising way to overcome these limitations is post-training compression, which aims to balance reduced model size against performance degradation. This work presents Any Compression via Iterative Pruning (ACIP), a novel algorithmic approach to determine a compression-performance trade-off from a single stochastic gradient descent run. To achieve parameter efficiency, we use an SVD-reparametrization of linear layers and iteratively prune their singular values with a sparsity-inducing penalty. Importantly, the pruning order of the parameters is used to derive a global score map that allows compressing a model to any target size without re-computation. We evaluate ACIP on a large selection of open-weight LLMs and downstream tasks, demonstrating state-of-the-art results compared to existing factorization-based compression methods. We also show that ACIP seamlessly complements common quantization-based compression techniques.
Cite
Text
Genzel et al. "Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation." Transactions on Machine Learning Research, 2025.Markdown
[Genzel et al. "Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/genzel2025tmlr-choose/)BibTeX
@article{genzel2025tmlr-choose,
title = {{Choose Your Model Size: Any Compression of Large Language Models Without Re-Computation}},
author = {Genzel, Martin and Putzky, Patrick and Zhao, Pengfei and Schulze, Sebastian and Mollenhauer, Mattes and Seidel, Robert and Dietzel, Stefan and Wollmann, Thomas},
journal = {Transactions on Machine Learning Research},
year = {2025},
url = {https://mlanthology.org/tmlr/2025/genzel2025tmlr-choose/}
}