VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping

Zheng Chen, Yu Zeng, Zehui Chen, Hongzhi Gao, Lin Chen, Jiaming Liu, Feng Zhao

AAAI 2025 pp. 2385-2393

doi:10.1609/AAAI.V39I3.32239 /aaai/2025/chen2025aaai-vfm/

Abstract

Although pre-trained large vision foundation models (VFM) yield superior results on various downstream tasks, full fine-tuning is often impractical due to its high computational cost and storage requirements. Recent advancements in parameter-efficient fine-tuning (PEFT) of VFM for image classification show significant promise. However, the application of PEFT techniques to dense prediction tasks remains largely unexplored. Our analysis of existing methods reveals that the underlying premise of utilizing low-rank parameter matrices, despite their efficacy in specific applications, may not be adequately suitable for dense prediction tasks. To this end, we propose a novel PEFT learning approach tailored for dense prediction tasks, namely VFM-Adapter. Specifically, the VFM-Adapter introduces a hybrid operation mapping technique that seamlessly integrates local information with global modeling to the adapter module. It capitalizes on the distinct inductive biases inherent in different operations. Additionally, we dynamically generate parameters for the VFM-Adapter, enabling flexibility of feature extraction given specific inputs. To validate the efficacy of VFM-Adapter, we conduct extensive experiments across object detection, semantic segmentation, and instance segmentation tasks. Results on multiple benchmarks consistently demonstrate the superiority of our method over previous approaches. Notably, with only three percent of the trainable parameters of the SAM-Base backbone, our approach achieves competitive or even superior performance compared to full fine-tuning. The code will be available.

PDF AAAI Semantic Scholar

Cite

Text

Chen et al. "VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32239

Markdown

[Chen et al. "VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/chen2025aaai-vfm/) doi:10.1609/AAAI.V39I3.32239

BibTeX

@inproceedings{chen2025aaai-vfm,
  title     = {{VFM-Adapter: Adapting Visual Foundation Models for Dense Prediction with Dynamic Hybrid Operation Mapping}},
  author    = {Chen, Zheng and Zeng, Yu and Chen, Zehui and Gao, Hongzhi and Chen, Lin and Liu, Jiaming and Zhao, Feng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2385-2393},
  doi       = {10.1609/AAAI.V39I3.32239},
  url       = {https://mlanthology.org/aaai/2025/chen2025aaai-vfm/}
}