ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Park, Yeji; Lee, Deokyeong; Choe, Junsuk; Chang, Buru

doi:10.1609/AAAI.V39I6.32689

ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models

Yeji Park, Deokyeong Lee, Junsuk Choe, Buru Chang

AAAI 2025 pp. 6434-6442

doi:10.1609/AAAI.V39I6.32689 /aaai/2025/park2025aaai-convis/

Abstract

Hallucinations in Multimodal Large Language Models (MLLMs) where generated responses fail to accurately reflect the given image pose a significant challenge to their reliability. To address this, we introduce ConVis, a novel training-free contrastive decoding method. ConVis leverages a text-to-image (T2I) generation model to semantically reconstruct the given image from hallucinated captions. By comparing the contrasting probability distributions produced by the original and reconstructed images, ConVis enables MLLMs to capture visual contrastive signals that penalize hallucination generation. Notably, this method operates purely within the decoding process, eliminating the need for additional data or model updates. Our extensive experiments on five popular benchmarks demonstrate that ConVis effectively reduces hallucinations across various MLLMs, highlighting its potential to enhance model reliability.

PDF AAAI Semantic Scholar

Cite

Text

Park et al. "ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I6.32689

Markdown

[Park et al. "ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/park2025aaai-convis/) doi:10.1609/AAAI.V39I6.32689

BibTeX

@inproceedings{park2025aaai-convis,
  title     = {{ConVis: Contrastive Decoding with Hallucination Visualization for Mitigating Hallucinations in Multimodal Large Language Models}},
  author    = {Park, Yeji and Lee, Deokyeong and Choe, Junsuk and Chang, Buru},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {6434-6442},
  doi       = {10.1609/AAAI.V39I6.32689},
  url       = {https://mlanthology.org/aaai/2025/park2025aaai-convis/}
}