ChatGPT-Powered Hierarchical Comparisons for Image Classification

Abstract

The zero-shot open-vocabulary setting poses challenges for image classification.Fortunately, utilizing a vision-language model like CLIP, pre-trained on image-textpairs, allows for classifying images by comparing embeddings. Leveraging largelanguage models (LLMs) such as ChatGPT can further enhance CLIP’s accuracyby incorporating class-specific knowledge in descriptions. However, CLIP stillexhibits a bias towards certain classes and generates similar descriptions for similarclasses, disregarding their differences. To address this problem, we present anovel image classification framework via hierarchical comparisons. By recursivelycomparing and grouping classes with LLMs, we construct a class hierarchy. Withsuch a hierarchy, we can classify an image by descending from the top to the bottomof the hierarchy, comparing image and text embeddings at each level. Throughextensive experiments and analyses, we demonstrate that our proposed approach isintuitive, effective, and explainable. Code will be released upon publication.

Cite

Text

Ren et al. "ChatGPT-Powered Hierarchical Comparisons for Image Classification." Neural Information Processing Systems, 2023.

Markdown

[Ren et al. "ChatGPT-Powered Hierarchical Comparisons for Image Classification." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/ren2023neurips-chatgptpowered/)

BibTeX

@inproceedings{ren2023neurips-chatgptpowered,
  title     = {{ChatGPT-Powered Hierarchical Comparisons for Image Classification}},
  author    = {Ren, Zhiyuan and Su, Yiyang and Liu, Xiaoming},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/ren2023neurips-chatgptpowered/}
}