Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification

Abstract

Large vision-language models (LVLMs) have achieved substantial advances in multimodal understanding. However, when presented with \textcolor{black}challenging or distribution-shifted inputs, they frequently produce unreliable or even harmful content, \textcolor{black}{such as hallucinations or toxic responses. We refer to such misalignments with human expectations as \emph{misbehaviors} of LVLMs, which} raise serious concerns for their deployment in critical applications. \textcolor{black}Existing research have disclosed that such misbehaviors are closely linked to model uncertainty. We find they primarily stem from two distinct sources of epistemic uncertainty: internal contradictions (conflict) and the absence of supporting information (ignorance). While existing uncertainty quantification methods typically capture only total predictive uncertainty, they struggle to distinguish between these underlying causes. To address this gap, we propose Evidential Uncertainty Quantification (EUQ), \textcolor{black}a training-free framework that explicitly decomposes epistemic uncertainty into conflict (CF) and ignorance (IG). Specifically, we interpret features from the model output head as either supporting (positive) or opposing (negative) evidence. Leveraging Dempster-Shafer Theory of belief functions, we aggregate this evidence to quantify internal conflict and knowledge gaps within a single forward pass. We extensively evaluate EUQ across four misbehavior categories, including hallucinations, jailbreaks, adversarial vulnerabilities, and out-of-distribution (OOD) failures using state-of-the-art LVLMs. Experimental results demonstrate that EUQ consistently outperforms strong baselines, \textcolor{black}{achieving relative improvements of up to 10.5\% in AUROC.} \textcolor{black}Our evaluation further reveals that hallucinations correspond to high internal conflict and OOD failures to high ignorance. \textcolor{black}Furthermore, a layer-wise evidential uncertainty dynamics analysis provides a novel perspective on the evolution of internal representations. The source code is available at \url{https://github.com/HT86159/EUQ}.

Cite

Text

Huang et al. "Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification." International Conference on Learning Representations, 2026.

Markdown

[Huang et al. "Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/huang2026iclr-detecting/)

BibTeX

@inproceedings{huang2026iclr-detecting,
  title     = {{Detecting Misbehaviors of Large Vision-Language Models by Evidential Uncertainty Quantification}},
  author    = {Huang, Tao and Wang, Rui and Liu, Xiaofei and Qin, Yi and Duan, Li and Jing, Liping},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/huang2026iclr-detecting/}
}