Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models

Abstract

Although Large Vision-Language Models (LVLMs) have achieved impressive results, their high computational costs pose a significant barrier to wide application. To enhance inference efficiency, most existing approaches can be categorized as parameter-dependent or token-dependent strategies to reduce computational demands. However, parameter-dependent methods require retraining LVLMs to recover performance while token-dependent strategies struggle to consistently select the most relevant tokens. In this paper, we systematically analyze the above challenges and provide a series of valuable insights for inference acceleration. Based on these findings, we propose a novel framework, the Pruning All-Rounder (PAR). Different from previous works, PAR develops a meta-router to adaptively organize pruning flows across both tokens and layers. With a self-supervised learning manner, our method achieves a superior balance between performance and efficiency. Notably, PAR is highly flexible, offering multiple pruning versions to address a range of acceleration scenarios. The code for this work is publicly available at https://github.com/ASGO-MM/Pruning-All-Rounder.

Cite

Text

Suo et al. "Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models." International Conference on Computer Vision, 2025.

Markdown

[Suo et al. "Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/suo2025iccv-pruning/)

BibTeX

@inproceedings{suo2025iccv-pruning,
  title     = {{Pruning All-Rounder: Rethinking and Improving Inference Efficiency for Large Vision Language Models}},
  author    = {Suo, Wei and Ma, Ji and Sun, Mengyang and Wu, Lin Yuanbo and Wang, Peng and Zhang, Yanning},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {20247-20256},
  url       = {https://mlanthology.org/iccv/2025/suo2025iccv-pruning/}
}