Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization

Abstract

Structural pruning enhances hardware-agnostic inference efficiency for large language models (LLMs) yet often fails to maintain comparable performance. Local pruning performs efficient layer-by-layer compression but ignores global topology. Although global pruning aims to identify an optimal sparse model, intuitive methods typically adopt a two-stage paradigm that first evaluates substructure saliency and then applies global pruning, which ignores inter-structure dependencies and fails to achieve end-to-end optimization. To address these limitations, we propose Týr-the-Pruner, an efficient end-to-end search-based global structural pruning framework. This framework constructs a supernet by repeatedly applying local pruning across a range of sparsity ratios to each layer in an LLM, with the core goal of determining the optimal sparsity distribution under a target overall sparsity ratio. Concretely, we introduce an effective local pruning and an expectation error accumulation approach to improve supernet construction. Furthermore, we employ an iterative prune-and-search strategy with coarse-to-fine sparsity granularity to ensure efficient search convergence. Experimental results show that Týr-the-Pruner achieves state-of-the-art structural pruning, retaining 97% of the dense model's performance while removing a challenging 50% of Llama-3.1-70B's parameters.

Cite

Text

Li et al. "Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization." Advances in Neural Information Processing Systems, 2025.

Markdown

[Li et al. "Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-tyrthepruner/)

BibTeX

@inproceedings{li2025neurips-tyrthepruner,
  title     = {{Týr-the-Pruner: Structural Pruning LLMs via Global Sparsity Distribution Optimization}},
  author    = {Li, Guanchen and Xu, Yixing and Li, Zeping and Liu, Ji and Yin, Xuanwu and Li, Dong and Barsoum, Emad},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/li2025neurips-tyrthepruner/}
}