BMIP: Bi-Directional Modality Interaction Prompt Learning for VLM
Abstract
Vision-language models (VLMs) have exhibited remarkable generalization capabilities, and prompt learning for VLMs has attracted great attention for the ability to adapt pre-trained VLMs to specific downstream tasks. However, existing studies mainly focus on single-modal prompts or uni-directional modality interaction, overlooking the powerful alignment effects resulting from the interaction between the vision and language modalities. To this end, we propose a novel prompt learning method called Bi-directional Modality Interaction Prompt (BMIP), which dynamically weights bi-modal information through learning the information of the attention layer, enhancing trainability and inter-modal consistency compared to simple information aggregation methods. To evaluate the effectiveness of prompt learning methods, we propose a more realistic evaluation paradigm called open-world generalization complementing the widely adopted cross-dataset transfer and domain generalization tasks. Comprehensive experiments on various datasets reveal that BMIP not only outperforms current state-of-the-art methods across all three evaluation paradigms but is also flexible enough to be combined with other prompt-based methods for consistent performance enhancement.
Cite
Text
Lv et al. "BMIP: Bi-Directional Modality Interaction Prompt Learning for VLM." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/655Markdown
[Lv et al. "BMIP: Bi-Directional Modality Interaction Prompt Learning for VLM." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/lv2025ijcai-bmip/) doi:10.24963/IJCAI.2025/655BibTeX
@inproceedings{lv2025ijcai-bmip,
title = {{BMIP: Bi-Directional Modality Interaction Prompt Learning for VLM}},
author = {Lv, Song-Lin and Chen, Yu-Yang and Zhou, Zhi and Yang, Ming and Guo, Lan-Zhe},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2025},
pages = {5887-5895},
doi = {10.24963/IJCAI.2025/655},
url = {https://mlanthology.org/ijcai/2025/lv2025ijcai-bmip/}
}