Discovering Unwritten Visual Classifiers with Large Language Models

Chiquier, Mia; Mall, Utkarsh; Vondrick, Carl

doi:10.1007/978-3-031-73039-9_11

Discovering Unwritten Visual Classifiers with Large Language Models

Mia Chiquier, Utkarsh Mall, Carl Vondrick

ECCV 2024

doi:10.1007/978-3-031-73039-9_11 /eccv/2024/chiquier2024eccv-discovering/

Abstract

Multimodal pre-trained models, such as CLIP, are popular for zero-shot classification due to their open-vocabulary flexibility and high performance. However, vision-language models, which compute similarity scores between images and class labels, are largely black-box, with limited interpretability, risk for bias, and inability to discover new visual concepts not written down. Moreover, in practical settings, the vocabulary for class names and attributes of specialized concepts will not be known, preventing these methods from performing well on images uncommon in large-scale vision-language datasets. To address these limitations, we present a novel method that discovers interpretable yet discriminative sets of attributes for visual recognition. We introduce an evolutionary search algorithm that uses the in-context learning abilities of large language models to iteratively mutate a concept bottleneck of attributes for classification. Our method produces state-of-the-art, interpretable fine-grained classifiers. We outperform the baselines by 18.4% on five fine-grained iNaturalist datasets and by 22.2% on two KikiBouba datasets, despite the baselines having access to privileged information.

PDF ECCV Semantic Scholar

Cite

Text

Chiquier et al. "Discovering Unwritten Visual Classifiers with Large Language Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73039-9_11

Markdown

[Chiquier et al. "Discovering Unwritten Visual Classifiers with Large Language Models." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/chiquier2024eccv-discovering/) doi:10.1007/978-3-031-73039-9_11

BibTeX

@inproceedings{chiquier2024eccv-discovering,
  title     = {{Discovering Unwritten Visual Classifiers with Large Language Models}},
  author    = {Chiquier, Mia and Mall, Utkarsh and Vondrick, Carl},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73039-9_11},
  url       = {https://mlanthology.org/eccv/2024/chiquier2024eccv-discovering/}
}