Variational Prompt Tuning Improves Generalization of Vision-Language Foundation Models

Abstract

Using prompt tuning, large vision-language foundation models can be adapted to downstream tasks by treating part of the input language prompts as learnable parameters and freezing the rest. However, existing work on prompt tuning may damage the generalization capabilities of foundation models. To avoid such limitations, we propose a probabilistic modeling of the underlying distribution of prompts, allowing prompts within the support of an associated concept to be de- rived through stochastic sampling. This results in a more complete and richer transfer of the information captured by the language model, providing better generalization capabilities for downstream tasks. The resulting algorithm relies on a simple yet powerful variational framework that can be directly integrated with other developments. We show our approach is seamlessly integrated into both standard and conditional prompt learning frameworks, improving the performance in both cases considerably, especially with regard to preserving the generalization capability of the original model. Our method provides the current state-of-the-art for prompt learning, surpassing CoCoOp by 1.6% average Top-1 accuracy on the standard benchmark. Remarkably, it even surpasses the original CLIP model in terms of generalization to new classes. The implementation code will be released.

Cite

Text

Derakhshani et al. "Variational Prompt Tuning Improves Generalization of Vision-Language Foundation Models." ICLR 2023 Workshops: ME-FoMo, 2023.

Markdown

[Derakhshani et al. "Variational Prompt Tuning Improves Generalization of Vision-Language Foundation Models." ICLR 2023 Workshops: ME-FoMo, 2023.](https://mlanthology.org/iclrw/2023/derakhshani2023iclrw-variational/)

BibTeX

@inproceedings{derakhshani2023iclrw-variational,
  title     = {{Variational Prompt Tuning Improves Generalization of Vision-Language Foundation Models}},
  author    = {Derakhshani, Mohammad Mahdi and Sanchez, Enrique and Bulat, Adrian and da Costa, Victor Guilherme Turrisi and Snoek, Cees G. M. and Tzimiropoulos, Georgios and Martinez, Brais},
  booktitle = {ICLR 2023 Workshops: ME-FoMo},
  year      = {2023},
  url       = {https://mlanthology.org/iclrw/2023/derakhshani2023iclrw-variational/}
}