Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning

Abstract

Prompt Tuning has emerged as a prominent research paradigm for adapting vision-language models to various downstream tasks. However recent research indicates that prompt tuning methods often lead to overfitting due to limited training samples. In this paper we propose a Cross-modal Aligned Feature Tuning (CRAFT) method to address this issue. Cross-modal alignment is conducted by first selecting anchors from the alternative domain and deriving relative representations of the embeddings for the selected anchors. Optimizing for a feature alignment loss over anchor-aligned text and image modalities creates a more unified text-image common space. Overfitting in prompt tuning also deteriorates model performance on out-of-distribution samples. To further improve the prompt model's robustness we propose minimizing Maximum Mean Discrepancy (MMD) over the anchor-aligned feature spaces to mitigate domain shift. The experiment on four different prompt tuning structures consistently shows the improvement of our method with increases of up to 6.1% in the Base-to-Novel generalization task 5.8% in the group robustness task and 2.7% in the out-of-distribution tasks. The code is available at https://github.com/Jingchensun/Craft.

Cite

Text

Sun et al. "Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Sun et al. "Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/sun2025wacv-crossmodal/)

BibTeX

@inproceedings{sun2025wacv-crossmodal,
  title     = {{Cross-Modal Feature Alignment and MMD Improve Robustness of Prompt Tuning}},
  author    = {Sun, Jingchen and Sharma, Rohan and Lokhande, Vishnu and Chen, Changyou},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {4714-4724},
  url       = {https://mlanthology.org/wacv/2025/sun2025wacv-crossmodal/}
}