Prompt-Aligned Gradient for Prompt Tuning

Abstract

Thanks to the large pre-trained vision-language models (VLMs) like CLIP, we can craft a zero-shot classifier by discrete prompt design, e.g., the confidence score of an image being "[CLASS]" can be obtained by using the VLM provided similarity between the image and the prompt sentence "a photo of a [CLASS]". Furthermore, prompting shows great potential for fast adaptation of VLMs to downstream tasks if we fine-tune the soft prompts with few samples. However, we find a common failure that improper fine-tuning or learning with extremely few-shot samples may even under-perform the zero-shot prediction. Existing methods still address this problem by using traditional anti-overfitting techniques such as early stopping and data augmentation, which lack a principled solution specific to prompting. In this paper, we present Prompt-aligned Gradient, dubbed ProGrad to prevent prompt tuning from forgetting the general knowledge learned from VLMs. In particular, ProGrad only updates the prompt whose gradient is aligned (or non-conflicting) to the general knowledge, which is represented as the optimization direction offered by the predefined prompt predictions. Extensive experiments under the few-shot learning, domain generalization, base-to-new generalization and cross-dataset transfer settings demonstrate the stronger few-shot generalization ability of ProGrad over state-of-the-art prompt tuning methods. Codes and theoretical proof are in Appendix.

Cite

Text

Zhu et al. "Prompt-Aligned Gradient for Prompt Tuning." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01435

Markdown

[Zhu et al. "Prompt-Aligned Gradient for Prompt Tuning." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/zhu2023iccv-promptaligned/) doi:10.1109/ICCV51070.2023.01435

BibTeX

@inproceedings{zhu2023iccv-promptaligned,
  title     = {{Prompt-Aligned Gradient for Prompt Tuning}},
  author    = {Zhu, Beier and Niu, Yulei and Han, Yucheng and Wu, Yue and Zhang, Hanwang},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {15659-15669},
  doi       = {10.1109/ICCV51070.2023.01435},
  url       = {https://mlanthology.org/iccv/2023/zhu2023iccv-promptaligned/}
}