ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization
Abstract
We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation, which approximates the pruned blocks. The estimated linear mapping can be seam- lessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model’s performance on open benchmarks—without any training or healing steps, resulting in minimal computational overhead. We provide an open- source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at https://github.com/mts-ai/ReplaceMe.
Cite
Text
Shopkhoev et al. "ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization." Advances in Neural Information Processing Systems, 2025.Markdown
[Shopkhoev et al. "ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shopkhoev2025neurips-replaceme/)BibTeX
@inproceedings{shopkhoev2025neurips-replaceme,
title = {{ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization}},
author = {Shopkhoev, Dmitriy and Ali, Ammar and Zhussip, Magauiya and Malykh, Valentin and Lefkimmiatis, Stamatios and Komodakis, Nikos and Zagoruyko, Sergey},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/shopkhoev2025neurips-replaceme/}
}