Detecting Covariate Shifts with Vision-Language Foundation Models

Abstract

Deployed machine learning models often encounter significant challenges in-the-wild due to distribution shifts, where inputs deviate from the training distribution. Covariate shifts, a specific type of distribution shift, have traditionally been addressed with robustness-focused approaches; however, existing models still experience substantial performance degradation under such conditions. In this work, we propose reframing covariate shift detection as an out-of-distribution (OOD) detection problem. We leverage vision-language models (VLMs), in particular CLIP, for detecting covariate shifts using zero-shot detection techniques that require no task-specific training. To facilitate this effort, we introduce ImageNet-CS, a comprehensive benchmark comprising six covariate-shifted datasets derived from ImageNet. Our results demonstrate that VLMs outperform traditional supervised methods in detecting covariate shifts, underscoring their promise for improving the reliability of models deployed in the real world.

Cite

Text

Heng and Soh. "Detecting Covariate Shifts with Vision-Language Foundation Models." ICLR 2025 Workshops: FM-Wild, 2025.

Markdown

[Heng and Soh. "Detecting Covariate Shifts with Vision-Language Foundation Models." ICLR 2025 Workshops: FM-Wild, 2025.](https://mlanthology.org/iclrw/2025/heng2025iclrw-detecting/)

BibTeX

@inproceedings{heng2025iclrw-detecting,
  title     = {{Detecting Covariate Shifts with Vision-Language Foundation Models}},
  author    = {Heng, Alvin and Soh, Harold},
  booktitle = {ICLR 2025 Workshops: FM-Wild},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/heng2025iclrw-detecting/}
}