Estimating Uncertainty in Multimodal Foundation Models Using Public Internet Data

Abstract

Foundation models are trained on vast amounts of data at scale using self-supervised learning, enabling adaptation to a wide range of downstream tasks. At test time, these models exhibit zero-shot capabilities through which they can classify previously unseen (user-specified) categories. In this paper, we address the problem of quantifying uncertainty in these zero-shot predictions. We propose a heuristic approach for uncertainty estimation in zero-shot settings using conformal prediction with web data. Given a set of classes at test time, we conduct zero-shot classification with CLIP-style models using a prompt template, e.g., ``an image of a <category>'', and use the same template as a search query to source calibration data from the open web. Given a web-based calibration set, we apply conformal prediction with a novel conformity score that accounts for potential errors in retrieved web data. We evaluate the utility of our proposed method in Biomedical foundation models; our preliminary results show that web-based conformal prediction sets achieve the target coverage with satisfactory efficiency on a variety of biomedical datasets.

Cite

Text

Dutta et al. "Estimating Uncertainty in Multimodal Foundation Models Using Public Internet Data." NeurIPS 2023 Workshops: R0-FoMo, 2023.

Markdown

[Dutta et al. "Estimating Uncertainty in Multimodal Foundation Models Using Public Internet Data." NeurIPS 2023 Workshops: R0-FoMo, 2023.](https://mlanthology.org/neuripsw/2023/dutta2023neuripsw-estimating/)

BibTeX

@inproceedings{dutta2023neuripsw-estimating,
  title     = {{Estimating Uncertainty in Multimodal Foundation Models Using Public Internet Data}},
  author    = {Dutta, Shiladitya and Wei, Hongbo and van der Laan, Lars and Alaa, Ahmed},
  booktitle = {NeurIPS 2023 Workshops: R0-FoMo},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/dutta2023neuripsw-estimating/}
}