EvoPress: Accurate Dynamic Model Compression via Evolutionary Search
Abstract
The high computational costs of large language models (LLMs) have led to a flurry of research on LLM compression, via methods such as quantization, sparsification, or structured pruning. A new frontier in this area is given by \emph{dynamic, non-uniform} compression methods, which adjust the compression levels (e.g., sparsity) per-block or even per-layer in order to minimize accuracy loss, while guaranteeing a global compression threshold. Yet, current methods rely on estimating the ``importance'' of a given layer, implicitly assuming that layers contribute independently to the overall compression error. We begin from the motivating observation that this independence assumption does not generally hold for LLM compression: pruning a model further may even significantly recover performance. To address this, we propose EvoPress, a novel evolutionary framework for dynamic LLM compression. By formulating compression as a general optimization problem, EvoPress identifies optimal compression profiles in a highly efficient manner, and generalizes across diverse models and compression techniques. Via EvoPress, we achieve state-of-the-art performance for structured and unstructured compression of Llama, Mistral, and Phi models.
Cite
Text
Sieberling et al. "EvoPress: Accurate Dynamic Model Compression via Evolutionary Search." ICLR 2025 Workshops: SLLM, 2025.Markdown
[Sieberling et al. "EvoPress: Accurate Dynamic Model Compression via Evolutionary Search." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/sieberling2025iclrw-evopress/)BibTeX
@inproceedings{sieberling2025iclrw-evopress,
title = {{EvoPress: Accurate Dynamic Model Compression via Evolutionary Search}},
author = {Sieberling, Oliver and Kuznedelev, Denis and Alistarh, Dan},
booktitle = {ICLR 2025 Workshops: SLLM},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/sieberling2025iclrw-evopress/}
}