MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

Matteo Farina, Massimiliano Mancini, Elia Cunegatti, Gaowen Liu, Giovanni Iacca, Elisa Ricci

CVPR 2024 pp. 16185-16195

doi:10.1109/CVPR52733.2024.01532 /cvpr/2024/farina2024cvpr-multiflow/

Abstract

While excellent in transfer learning Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue removing parameters via model pruning is a viable solution. However existing techniques for VLMs are task-specific and thus require pruning the network from scratch for each new task of interest. In this work we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus we propose Multimodal Flow Pruning (MULTIFLOW) a first gradient-free pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP experimenting with two VLMs three vision-language tasks and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated combinatorial competitors in the vast majority of the cases paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.

PDF CVPR Semantic Scholar

Cite

Text

Farina et al. "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01532

Markdown

[Farina et al. "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/farina2024cvpr-multiflow/) doi:10.1109/CVPR52733.2024.01532

BibTeX

@inproceedings{farina2024cvpr-multiflow,
  title     = {{MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning}},
  author    = {Farina, Matteo and Mancini, Massimiliano and Cunegatti, Elia and Liu, Gaowen and Iacca, Giovanni and Ricci, Elisa},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {16185-16195},
  doi       = {10.1109/CVPR52733.2024.01532},
  url       = {https://mlanthology.org/cvpr/2024/farina2024cvpr-multiflow/}
}