Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models

Abstract

Advancements in vision-language models (VLMs) have propelled the field of computer vision particularly in the zero-shot learning setting. Despite their promise the effectiveness of these models often diminishes due to domain shifts in test environments. To address this we introduce the Test-Time Prototype Shifting (TPS) framework a pioneering approach designed to adapt VLMs to test datasets using unlabeled test inputs. Our method is based on the notion of modulating per-class prototypes in the shared embedding space. By pre-computing and caching prototypes generated with the pre-trained text encoder TPS not only facilitates optimization-free prototype reuse for subsequent predictions but also enables seamless integration with current advancements in prompt engineering. At test-time TPS dynamically learns shift vectors for each prototype based solely on the given test sample effectively bridging the domain gap and enhancing classification accuracy. A notable aspect of our framework is its significantly reduced memory and computational demands when compared to conventional text-prompt tuning methods. Extensive evaluations across 15 image classification datasets involving natural distribution shifts and cross-dataset generalization as well as in context-dependent visual reasoning demonstrate TPS's superior performance achieving state-of-the-art results while reducing resource requirements. Code is available at https://github.com/elaine-sui/TPS.

Cite

Text

Sui et al. "Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models." Winter Conference on Applications of Computer Vision, 2025.

Markdown

[Sui et al. "Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models." Winter Conference on Applications of Computer Vision, 2025.](https://mlanthology.org/wacv/2025/sui2025wacv-just/)

BibTeX

@inproceedings{sui2025wacv-just,
  title     = {{Just Shift It: Test-Time Prototype Shifting for Zero-Shot Generalization with Vision-Language Models}},
  author    = {Sui, Elaine and Wang, Xiaohan and Yeung-Levy, Serena},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2025},
  pages     = {825-835},
  url       = {https://mlanthology.org/wacv/2025/sui2025wacv-just/}
}