A Simple Zero-Shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models

Abstract

Contrastively trained text-image models have the remarkable ability to perform zero-shot classification, that is, classifying previously unseen images into categories that the model has never been explicitly trained to identify. However, these zero-shot classifiers need prompt engineering to achieve high accuracy. Prompt engineering typically requires hand-crafting a set of prompts for individual downstream tasks. In this work, we aim to automate this prompt engineering and improve zero-shot accuracy through prompt ensembling. In particular, we ask “Given a large pool of prompts, can we automatically score the prompts and ensemble those that are most suitable for a particular downstream dataset, without needing access to labeled validation data?". We demonstrate that this is possible. In doing so, we identify several pathologies in a naive prompt scoring method where the score can be easily overconfident due to biases in pre-training and test data, and we propose a novel prompt scoring method that corrects for the biases. Using our proposed scoring method to create a weighted average prompt ensemble, our method overall outperforms equal average ensemble, as well as hand-crafted prompts, on ImageNet, 4 of its variants, and 11 fine-grained classification benchmarks. while being fully automatic, optimization-free, and not requiring access to labeled validation data.

Cite

Text

Allingham et al. "A Simple Zero-Shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models." International Conference on Machine Learning, 2023.

Markdown

[Allingham et al. "A Simple Zero-Shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/allingham2023icml-simple/)

BibTeX

@inproceedings{allingham2023icml-simple,
  title     = {{A Simple Zero-Shot Prompt Weighting Technique to Improve Prompt Ensembling in Text-Image Models}},
  author    = {Allingham, James Urquhart and Ren, Jie and Dusenberry, Michael W and Gu, Xiuye and Cui, Yin and Tran, Dustin and Liu, Jeremiah Zhe and Lakshminarayanan, Balaji},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {547-568},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/allingham2023icml-simple/}
}