MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Abstract
While excellent in transfer learning Vision-Language models (VLMs) come with high computational costs due to their large number of parameters. To address this issue removing parameters via model pruning is a viable solution. However existing techniques for VLMs are task-specific and thus require pruning the network from scratch for each new task of interest. In this work we explore a new direction: Task-Agnostic Vision-Language Pruning (TA-VLP). Given a pretrained VLM the goal is to find a unique pruned counterpart transferable to multiple unknown downstream tasks. In this challenging setting the transferable representations already encoded in the pretrained model are a key aspect to preserve. Thus we propose Multimodal Flow Pruning (MULTIFLOW) a first gradient-free pruning framework for TA-VLP where: (i) the importance of a parameter is expressed in terms of its magnitude and its information flow by incorporating the saliency of the neurons it connects; and (ii) pruning is driven by the emergent (multimodal) distribution of the VLM parameters after pretraining. We benchmark eight state-of-the-art pruning algorithms in the context of TA-VLP experimenting with two VLMs three vision-language tasks and three pruning ratios. Our experimental results show that MULTIFLOW outperforms recent sophisticated combinatorial competitors in the vast majority of the cases paving the way towards addressing TA-VLP. The code is publicly available at https://github.com/FarinaMatteo/multiflow.
Cite
Text
Farina et al. "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01532Markdown
[Farina et al. "MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/farina2024cvpr-multiflow/) doi:10.1109/CVPR52733.2024.01532BibTeX
@inproceedings{farina2024cvpr-multiflow,
title = {{MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning}},
author = {Farina, Matteo and Mancini, Massimiliano and Cunegatti, Elia and Liu, Gaowen and Iacca, Giovanni and Ricci, Elisa},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {16185-16195},
doi = {10.1109/CVPR52733.2024.01532},
url = {https://mlanthology.org/cvpr/2024/farina2024cvpr-multiflow/}
}