UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model

Abstract

Human-labeled data is essential for deep learning models, but annotation costs hinder their use in real-world applications. Recently, however, models such as CLIP have shown remarkable zero-shot capabilities through vision-language pre-training. Although fine-tuning with human-labeled data can further improve the performance of zero-shot models, it is often impractical in low-budget real-world scenarios. In this paper, we propose an alternative algorithm, dubbed Unsupervised Open-Set Task Adaptation (UOTA), which fully leverages the large amounts of open-set unlabeled data collected in the wild to improve pre-trained zero-shot models in real-world scenarios.

Cite

Text

Min et al. "UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model." ICML 2023 Workshops: ES-FoMO, 2023.

Markdown

[Min et al. "UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/min2023icmlw-uota/)

BibTeX

@inproceedings{min2023icmlw-uota,
  title     = {{UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model}},
  author    = {Min, Youngjo and Ryoo, Kwangrok and Kim, Bumsoo and Kim, Taesup},
  booktitle = {ICML 2023 Workshops: ES-FoMO},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/min2023icmlw-uota/}
}