E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

Abstract

As the size of transformer-based models continues to grow, fine-tuning these large-scale pre-trained vision models for new tasks has become increasingly parameter-intensive. Parameter-efficient learning has been developed to reduce the number of tunable parameters during fine-tuning. Although these methods show promising results, there is still a significant performance gap compared to full fine-tuning. To address this challenge, we propose an Effective and Efficient Visual Prompt Tuning (E^2VPT) approach for large-scale transformer-based model adaptation. Specifically, we introduce a set of learnable key-value prompts and visual prompts into self-attention and input layers, respectively, to improve the effectiveness of model fine-tuning. Moreover, we design a prompt pruning procedure to systematically prune low importance prompts while preserving model performance, which largely enhances the model's efficiency. Empirical results demonstrate that our approach outperforms several state-of-the-art baselines on two benchmarks, with considerably low parameter usage (e.g., 0.32% of model parameters on VTAB-1k). We anticipate that this work will inspire further exploration within the pretrain-then-finetune paradigm for large-scale models.

Cite

Text

Han et al. "E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01604

Markdown

[Han et al. "E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/han2023iccv-2vpt/) doi:10.1109/ICCV51070.2023.01604

BibTeX

@inproceedings{han2023iccv-2vpt,
  title     = {{E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning}},
  author    = {Han, Cheng and Wang, Qifan and Cui, Yiming and Cao, Zhiwen and Wang, Wenguan and Qi, Siyuan and Liu, Dongfang},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {17491-17502},
  doi       = {10.1109/ICCV51070.2023.01604},
  url       = {https://mlanthology.org/iccv/2023/han2023iccv-2vpt/}
}