CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness

Abstract

The ability to acknowledge the inevitable uncertainty in their knowledge and reasoning is a prerequisite for AI systems to be truly truthful and reliable. In this paper, we present a taxonomy of uncertainty specific to vision-language AI systems, distinguishing between epistemic uncertainty (arising from a lack of information) and aleatoric uncertainty (due to inherent unpredictability), and further explore finer categories within. Based on this taxonomy, we synthesize a benchmark dataset, CertainlyUncertain, featuring 178K visual question answering (VQA) samples as contrastive pairs. This is achieved by 1) inpainting images to make previously answerable questions into unanswerable ones; and 2) using image captions to prompt large language models for both answerable and unanswerable questions. Additionally, we introduce a new metric confidence-weighted accuracy, that is well correlated with both accuracy and calibration error, to address the shortcomings of existing metrics. Despite the recent rapid progress in vision-language models (VLMs), evaluations on our benchmark show that they perform poorly in uncertain scenarios. Further experiments demonstrate that supervised fine-tuning with CertainlyUncertain enhances the performance of VLMs, and reduces the calibration error. These improvements extend beyond our benchmark to existing refusal-oriented datasets and show positive results on reducing hallucinations, while maintaining performance on standard VQA benchmarks. Our work underscores the importance of addressing uncertainty in vision-language AI systems to improve their reliability and trustworthiness in real-world applications.

Cite

Text

Chandu et al. "CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness." International Conference on Learning Representations, 2025.

Markdown

[Chandu et al. "CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/chandu2025iclr-certainlyuncertain/)

BibTeX

@inproceedings{chandu2025iclr-certainlyuncertain,
  title     = {{CertainlyUncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness}},
  author    = {Chandu, Khyathi and Li, Linjie and Awadalla, Anas and Lu, Ximing and Park, Jae Sung and Hessel, Jack and Wang, Lijuan and Choi, Yejin},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/chandu2025iclr-certainlyuncertain/}
}