WATT: Weight Average Test Time Adaptation of CLIP
Abstract
Vision-Language Models (VLMs) such as CLIP have yielded unprecedented performances for zero-shot image classification, yet their generalization capability may still be seriously challenged when confronted to domain shifts. In response, we present Weight Average Test-Time Adaptation (WATT) of CLIP, a new approach facilitating full test-time adaptation (TTA) of this VLM. Our method employs a diverse set of templates for text prompts, augmenting the existing framework of CLIP. Predictions are utilized as pseudo labels for model updates, followed by weight averaging to consolidate the learned information globally. Furthermore, we introduce a text ensemble strategy, enhancing the overall test performance by aggregating diverse textual cues.Our findings underscore the effectiveness of WATT across diverse datasets, including CIFAR-10-C, CIFAR-10.1, CIFAR-100-C, VisDA-C, and several other challenging datasets, effectively covering a wide range of domain shifts. Notably, these enhancements are achieved without the need for additional model transformations or trainable modules. Moreover, compared to other TTA methods, our approach can operate effectively with just a single image. The code is available at: https://github.com/Mehrdad-Noori/WATT.
Cite
Text
Osowiechi et al. "WATT: Weight Average Test Time Adaptation of CLIP." Neural Information Processing Systems, 2024. doi:10.52202/079017-1522Markdown
[Osowiechi et al. "WATT: Weight Average Test Time Adaptation of CLIP." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/osowiechi2024neurips-watt/) doi:10.52202/079017-1522BibTeX
@inproceedings{osowiechi2024neurips-watt,
title = {{WATT: Weight Average Test Time Adaptation of CLIP}},
author = {Osowiechi, David and Noori, Mehrdad and Hakim, Gustavo A. Vargas and Yazdanpanah, Moslem and Bahri, Ali and Cheraghalikhani, Milad and Dastani, Sahar and Beizaee, Farzad and Ayed, Ismail Ben and Desrosiers, Christian},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-1522},
url = {https://mlanthology.org/neurips/2024/osowiechi2024neurips-watt/}
}