Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling for Large Vision-Language Models

Abstract

Large Vision-Language Model (LVLM), leveraging Large Language Model (LLM) as the cognitive core, has recently become one of the most representative multimodal model paradigms. However, with the expansion of unimodal branches, \emph{i.e.} visual encoder and LLM, the storage and computational burdens intensify, posing challenges for deployment. Structured pruning has proved promising in compressing large models by trimming a large portion of insignificant network structures. Nevertheless, most of them are predominantly designed for LLMs, either relying on unitary importance metrics that fail to deal with modality-wise imbalances or adopting generic pruning and recovery paradigms that overlook the unique calibration status and capability requirements of large models, leading to substantial performance degradation. To address these issues, we propose a novel structured pruning approach for LVLMs, dubbed Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling (UKMP). Specifically, we design a Unified Knowledge Maintenance Importance (UKMI) metric, which simultaneously considers balancing the block-wise and modality-wise importance by adaptive normalization, optimizing the importance estimation by refining gradient-based criteria, and maintaining the knowledge capacity of LVLMs by using the angle distribution information entropy. Moreover, we develop a LoRA-based Progressive Distillation (LPD) method that recalls the pruned weights and performs progressive distillation for comprehensive recovery. Extensive experimental results across various vision-language tasks demonstrate the effectiveness of our approach, comparing to the state-of-the-art structured pruning methods.

Cite

Text

Wu et al. "Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling for Large Vision-Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32923

Markdown

[Wu et al. "Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling for Large Vision-Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wu2025aaai-unified/) doi:10.1609/AAAI.V39I8.32923

BibTeX

@inproceedings{wu2025aaai-unified,
  title     = {{Unified Knowledge Maintenance Pruning and Progressive Recovery with Weight Recalling for Large Vision-Language Models}},
  author    = {Wu, Zimeng and Chen, Jiaxin and Wang, Yunhong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8550-8558},
  doi       = {10.1609/AAAI.V39I8.32923},
  url       = {https://mlanthology.org/aaai/2025/wu2025aaai-unified/}
}