Skip Tuning: Pre-Trained Vision-Language Models Are Effective and Efficient Adapters Themselves
Abstract
Prompt tuning (PT) has long been recognized as an effective and efficient paradigm for transferring large pre-trained vision-language models (VLMs) to downstream tasks by learning a tiny set of context vectors. Nevertheless, in this work, we reveal that freezing the parameters of VLMs during learning the context vectors neither facilitates the transferability of pre-trained knowledge nor improves the memory and time efficiency significantly. Upon further investigation, we find that reducing both the length and width of the feature-gradient propagation flows of the full fine-tuning (FT) baseline is key to achieving effective and efficient knowledge transfer. Motivated by this, we propose Skip Tuning, a novel paradigm for adapting VLMs to downstream tasks. Unlike existing PT or adapter-based methods, Skip Tuning applies Layer-wise Skipping (LSkip) and Class-wise Skipping (CSkip) upon the FT baseline without introducing extra context vectors or adapter modules. Extensive experiments across a wide spectrum of benchmarks demonstrate the superior effectiveness and efficiency of our Skip Tuning over both PT and adapter-based methods. Code: https://github.com/anonymity-007/SkipT.
Cite
Text
Wu et al. "Skip Tuning: Pre-Trained Vision-Language Models Are Effective and Efficient Adapters Themselves." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01372Markdown
[Wu et al. "Skip Tuning: Pre-Trained Vision-Language Models Are Effective and Efficient Adapters Themselves." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/wu2025cvpr-skip/) doi:10.1109/CVPR52734.2025.01372BibTeX
@inproceedings{wu2025cvpr-skip,
title = {{Skip Tuning: Pre-Trained Vision-Language Models Are Effective and Efficient Adapters Themselves}},
author = {Wu, Shihan and Zhang, Ji and Zeng, Pengpeng and Gao, Lianli and Song, Jingkuan and Shen, Heng Tao},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {14723-14732},
doi = {10.1109/CVPR52734.2025.01372},
url = {https://mlanthology.org/cvpr/2025/wu2025cvpr-skip/}
}