Meta-Prompting for Automating Zero-Shot Visual Recognition with LLMs

Abstract

Prompt ensembling of Large Language Model (LLM) generated category-specific prompts has emerged as an effective method to enhance zero-shot recognition ability of Vision-Language Models (VLMs). To obtain these category-specific prompts, the present methods rely on hand-crafting the prompts to the LLMs for generating VLM prompts for the downstream tasks. However, this requires manually composing these task-specific prompts and still, they might not cover the diverse set of visual concepts and task-specific styles associated with the categories of interest. To effectively take humans out of the loop and completely automate the prompt generation process for zero-shot recognition, we propose Meta-Prompting for Visual Recognition (). Taking as input only minimal information about the target task, in the form of its short natural language description, and a list of associated class labels, automatically produces a diverse set of category-specific prompts resulting in a strong zero-shot classifier. generalizes effectively across various popular zero-shot image recognition benchmarks belonging to widely different domains when tested with multiple LLMs and VLMs. For example, obtains a zero-shot recognition improvement over CLIP by up to 19.8% and 18.2% (5.0% and 4.5% on average over 20 datasets) leveraging GPT and Mixtral LLMs, respectively.

Cite

Text

Mirza et al. "Meta-Prompting for Automating Zero-Shot Visual Recognition with LLMs." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72627-9_21

Markdown

[Mirza et al. "Meta-Prompting for Automating Zero-Shot Visual Recognition with LLMs." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/mirza2024eccv-metaprompting/) doi:10.1007/978-3-031-72627-9_21

BibTeX

@inproceedings{mirza2024eccv-metaprompting,
  title     = {{Meta-Prompting for Automating Zero-Shot Visual Recognition with LLMs}},
  author    = {Mirza, Muhammad Jehanzeb and Karlinsky, Leonid and Lin, Wei and Doveh, Sivan and Micorek, Jakub and Kozinski, Mateusz and Kuehne, Hilde and Possegger, Horst},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72627-9_21},
  url       = {https://mlanthology.org/eccv/2024/mirza2024eccv-metaprompting/}
}