Double-Filter: Efficient Fine-Tuning of Pre-Trained Vision-Language Models via Patch&Layer Filtering

He, Yaoqin; Fu, Junchen; Zheng, Kaiwen; Xu, Songpei; Chen, Fuhai; Li, Jie; Jose, Joemon M.; Ge, Xuri

Double-Filter: Efficient Fine-Tuning of Pre-Trained Vision-Language Models via Patch&Layer Filtering

Yaoqin He, Junchen Fu, Kaiwen Zheng, Songpei Xu, Fuhai Chen, Jie Li, Joemon M. Jose, Xuri Ge

ICML 2025 pp. 22407-22421

/icml/2025/he2025icml-doublefilter/

Abstract

In this paper, we present a novel approach, termed Double-Filter, to “slim down” the fine-tuning process of vision-language pre-trained (VLP) models via filtering redundancies in feature inputs and architectural components. We enhance the fine-tuning process using two approaches. First, we develop a new patch selection method incorporating image patch filtering through background and foreground separation, followed by a refined patch selection process. Second, we design a genetic algorithm to eliminate redundant fine-grained architecture layers, improving the efficiency and effectiveness of the model. The former makes patch selection semantics more comprehensive, improving inference efficiency while ensuring semantic representation. The latter’s fine-grained layer filter removes architectural redundancy to the extent possible and mitigates the impact on performance. Experimental results demonstrate that the proposed Double-Filter achieves superior efficiency of model fine-tuning and maintains competitive performance compared with the advanced efficient fine-tuning methods on three downstream tasks, VQA, NLVR and Retrieval. In addition, it has been proven to be effective under METER and ViLT VLP models.

PDF ICML OpenReview Semantic Scholar

Cite

Text

He et al. "Double-Filter: Efficient Fine-Tuning of Pre-Trained Vision-Language Models via Patch&Layer Filtering." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[He et al. "Double-Filter: Efficient Fine-Tuning of Pre-Trained Vision-Language Models via Patch&Layer Filtering." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/he2025icml-doublefilter/)

BibTeX

@inproceedings{he2025icml-doublefilter,
  title     = {{Double-Filter: Efficient Fine-Tuning of Pre-Trained Vision-Language Models via Patch&Layer Filtering}},
  author    = {He, Yaoqin and Fu, Junchen and Zheng, Kaiwen and Xu, Songpei and Chen, Fuhai and Li, Jie and Jose, Joemon M. and Ge, Xuri},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {22407-22421},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/he2025icml-doublefilter/}
}