Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts

Karsten Roth, Jae Myung Kim, A. Sophia Koepke, Oriol Vinyals, Cordelia Schmid, Zeynep Akata

ICCV 2023 pp. 15746-15757

doi:10.1109/ICCV51070.2023.01443 /iccv/2023/roth2023iccv-waffling/

Abstract

The visual classification performance of vision-language models such as CLIP has been shown to benefit from additional semantic knowledge from large language models (LLMs) such as GPT-3. In particular, averaging over LLM-generated class descriptors, e.g. "waffle, which has a round shape", can notably improve generalization performance. In this work, we critically study this behavior and propose WaffleCLIP, a framework for zero-shot visual classification which simply replaces LLM-generated descriptors with random character and word descriptors. Without querying external models, we achieve comparable performance gains on a large number of visual classification tasks. This allows WaffleCLIP to both serve as a low-cost alternative, as well as a sanity check for any future LLM-based vision-language model extensions. We conduct an extensive experimental study on the impact and shortcomings of additional semantics introduced with LLM-generated descriptors, and showcase how - if available - semantic context is better leveraged by querying LLMs for high-level concepts, which we show can be done to jointly resolve potential class name ambiguities. Code is available here: https://github.com/ExplainableML/WaffleCLIP.

PDF ICCV Semantic Scholar

Cite

Text

Roth et al. "Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.01443

Markdown

[Roth et al. "Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/roth2023iccv-waffling/) doi:10.1109/ICCV51070.2023.01443

BibTeX

@inproceedings{roth2023iccv-waffling,
  title     = {{Waffling Around for Performance: Visual Classification with Random Words and Broad Concepts}},
  author    = {Roth, Karsten and Kim, Jae Myung and Koepke, A. Sophia and Vinyals, Oriol and Schmid, Cordelia and Akata, Zeynep},
  booktitle = {International Conference on Computer Vision},
  year      = {2023},
  pages     = {15746-15757},
  doi       = {10.1109/ICCV51070.2023.01443},
  url       = {https://mlanthology.org/iccv/2023/roth2023iccv-waffling/}
}