M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning
Abstract
Prompt learning is an effective means of fine-tuning multi-modal foundation models such as CLIP. Despite existing success, the inner mechanism of multi-modal prompt learning has not been well understood. In this work, we identify an inductive bias of multi-modal prompt learning, which we refer to as view bias, that the learned prompts may extract only a partial subset of useful features (views) and ignore others. This bias can undermine the model's generalization ability, particularly under distribution shifts. We further observe that independently trained prompts have distinct view biases, contrary to the existing belief that they may converge to similar local optima due to having the same cross-modal representation matching objective. Based on our observations, we propose Multi-modal Matching Multi-Prompt Learning (M$^3$PL), which incorporates multiple paired prompts and a cross-modal contrastive regularizer that facilitates the prompt pairs to encapsulate a broader spectrum of views. Extensive experiments show that M$^3$PL effectively boosts the model's generalization capability, achieving state-of-the-art performance under various distribution shifts.
Cite
Text
Zhao et al. "M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning." Transactions on Machine Learning Research, 2024.Markdown
[Zhao et al. "M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning." Transactions on Machine Learning Research, 2024.](https://mlanthology.org/tmlr/2024/zhao2024tmlr-3pl/)BibTeX
@article{zhao2024tmlr-3pl,
title = {{M$^3$PL: Identifying and Exploiting View Bias of Prompt Learning}},
author = {Zhao, Chujie and Zhang, Tianren and Chen, Guanyu and Jiang, Yizhou and Chen, Feng},
journal = {Transactions on Machine Learning Research},
year = {2024},
url = {https://mlanthology.org/tmlr/2024/zhao2024tmlr-3pl/}
}