ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization

Abstract

We introduce ReplaceMe, a generalized training-free depth pruning method that effectively replaces transformer blocks with a linear operation, while maintaining high performance for low compression ratios. In contrast to conventional pruning approaches that require additional training or fine-tuning, our approach requires only a small calibration dataset that is used to estimate a linear transformation, which approximates the pruned blocks. The estimated linear mapping can be seam- lessly merged with the remaining transformer blocks, eliminating the need for any additional network parameters. Our experiments show that ReplaceMe consistently outperforms other training-free approaches and remains highly competitive with state-of-the-art pruning methods that involve extensive retraining/fine-tuning and architectural modifications. Applied to several large language models (LLMs), ReplaceMe achieves up to 25% pruning while retaining approximately 90% of the original model’s performance on open benchmarks—without any training or healing steps, resulting in minimal computational overhead. We provide an open- source library implementing ReplaceMe alongside several state-of-the-art depth pruning techniques, available at https://github.com/mts-ai/ReplaceMe.

Cite

Text

Shopkhoev et al. "ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shopkhoev et al. "ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shopkhoev2025neurips-replaceme/)

BibTeX

@inproceedings{shopkhoev2025neurips-replaceme,
  title     = {{ReplaceMe: Network Simplification via Depth Pruning and Transformer Block Linearization}},
  author    = {Shopkhoev, Dmitriy and Ali, Ammar and Zhussip, Magauiya and Malykh, Valentin and Lefkimmiatis, Stamatios and Komodakis, Nikos and Zagoruyko, Sergey},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shopkhoev2025neurips-replaceme/}
}