Multi-Objective One-Shot Pruning for Large Language Models

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks but require substantial computational resources, limiting their deployment in resource-constrained environments. While one-shot pruning methods can reduce model size without expensive retraining, they typically optimize for single objectives, ignoring LLMs' multi-faceted applications. We introduce Multi-Objective One-Shot Pruning (MOSP), which formulates LLM pruning as a multi-objective optimization problem. MOSP efficiently generates a Pareto set of pruned models representing different capability trade-offs, allowing users to select solutions aligned with their preferences. The proposed approach identifies share core support while enabling specialized support. Experiments across various LLMs and sparsity levels demonstrate MOSP's superior performance in navigating multi-objective trade-offs compared to baseline methods.

Cite

Text

Chen et al. "Multi-Objective One-Shot Pruning for Large Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Chen et al. "Multi-Objective One-Shot Pruning for Large Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/chen2025neurips-multiobjective/)

BibTeX

@inproceedings{chen2025neurips-multiobjective,
  title     = {{Multi-Objective One-Shot Pruning for Large Language Models}},
  author    = {Chen, Weiyu and Yang, Hansi and Gou, Yunhao and Shi, Han and Hu, En-Liang and Li, Zhenguo and Kwok, James},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/chen2025neurips-multiobjective/}
}