UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model
Abstract
Human-labeled data is essential for deep learning models, but annotation costs hinder their use in real-world applications. Recently, however, models such as CLIP have shown remarkable zero-shot capabilities through vision-language pre-training. Although fine-tuning with human-labeled data can further improve the performance of zero-shot models, it is often impractical in low-budget real-world scenarios. In this paper, we propose an alternative algorithm, dubbed Unsupervised Open-Set Task Adaptation (UOTA), which fully leverages the large amounts of open-set unlabeled data collected in the wild to improve pre-trained zero-shot models in real-world scenarios.
Cite
Text
Min et al. "UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model." ICML 2023 Workshops: ES-FoMO, 2023.Markdown
[Min et al. "UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model." ICML 2023 Workshops: ES-FoMO, 2023.](https://mlanthology.org/icmlw/2023/min2023icmlw-uota/)BibTeX
@inproceedings{min2023icmlw-uota,
title = {{UOTA: Unsupervised Open-Set Task Adaptation Using a Vision-Language Foundation Model}},
author = {Min, Youngjo and Ryoo, Kwangrok and Kim, Bumsoo and Kim, Taesup},
booktitle = {ICML 2023 Workshops: ES-FoMO},
year = {2023},
url = {https://mlanthology.org/icmlw/2023/min2023icmlw-uota/}
}