LOVM: Language-Only Vision Model Selection

Abstract

Pre-trained multi-modal vision-language models (VLMs) are becoming increasingly popular due to their exceptional performance on downstream vision applications, particularly in the few- and zero-shot settings. However, selecting the best-performing VLM for some downstream applications is non-trivial, as it is dataset and task-dependent. Meanwhile, the exhaustive evaluation of all available VLMs on a novel application is not only time and computationally demanding but also necessitates the collection of a labeled dataset for evaluation. As the number of open-source VLM variants increases, there is a need for an efficient model selection strategy that does not require access to a curated evaluation dataset. This paper proposes a novel task and benchmark for efficiently evaluating VLMs' zero-shot performance on downstream applications without access to the downstream task dataset. Specifically, we introduce a new task LOVM: Language-Only Vision Model Selection , where methods are expected to perform both model selection and performance prediction based solely on a text description of the desired downstream application. We then introduced an extensive LOVM benchmark consisting of ground-truth evaluations of 35 pre-trained VLMs and 23 datasets, where methods are expected to rank the pre-trained VLMs and predict their zero-shot performance.

Cite

Text

Zohar et al. "LOVM: Language-Only Vision Model Selection." Neural Information Processing Systems, 2023.

Markdown

[Zohar et al. "LOVM: Language-Only Vision Model Selection." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/zohar2023neurips-lovm/)

BibTeX

@inproceedings{zohar2023neurips-lovm,
  title     = {{LOVM: Language-Only Vision Model Selection}},
  author    = {Zohar, Orr and Huang, Shih-Cheng and Wang, Kuan-Chieh and Yeung, Serena},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/zohar2023neurips-lovm/}
}