Exploring the Better Multimodal Synergy Strategy for Vision-Language Models
Abstract
Vision-Language models (VLMs) have shown great potential in enhancing open-world visual concept comprehension. Recent researches focus on an optimum multimodal collaboration strategy that significantly advances CLIP-based few-shot tasks. However, existing prompt-based solutions suffer from unidirectional information flow and increased parameters since they explicitly condition the vision prompts on textual prompts across different transformer layers using non-shareable coupling functions. To address this issue, we propose a Dual-shared mechanism based on LoRA (DsRA) that addresses VLM adaptation in low-data regimes. The proposed DsRA enjoys several merits. First, we design an inter-modal shared coefficient that focuses on capturing visual and textual shared patterns, ensuring effective mutual synergy between image and text features. Second, an intra-modal shared matrix is proposed to achieve efficient parameter fine-tuning by combining the different coefficients to generate layer-wise adapters placed in encoder layers. Our extensive experiments demonstrate that DsRA improves the generalizability under few-shot classification, base-to-new generalization, and domain generalization settings. Our code will be released soon.
Cite
Text
Yin et al. "Exploring the Better Multimodal Synergy Strategy for Vision-Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34372Markdown
[Yin et al. "Exploring the Better Multimodal Synergy Strategy for Vision-Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/yin2025aaai-exploring-a/) doi:10.1609/AAAI.V39I21.34372BibTeX
@inproceedings{yin2025aaai-exploring-a,
title = {{Exploring the Better Multimodal Synergy Strategy for Vision-Language Models}},
author = {Yin, Xiaotian and Liu, Xin and Chen, Si and Wang, Yuan and Pan, Yuwen and Zhang, Tianzhu},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {22182-22190},
doi = {10.1609/AAAI.V39I21.34372},
url = {https://mlanthology.org/aaai/2025/yin2025aaai-exploring-a/}
}