Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance

Abstract

The advancement of Large Vision-Language Models (LVLMs) has increasingly highlighted the critical issue of their tendency to hallucinate non-existing objects in the images. To address this issue, previous works focused on using specially curated datasets or powerful LLMs (e.g., GPT-3.5) to rectify the outputs of LVLMs. However, these approaches require either expensive training/fine-tuning or API access to advanced LLMs for post-generation correction. In response to these limitations, we propose **M**itigating hallucin**A**tion via image-g**R**ounded gu**I**da**N**c**E** (MARINE), a framework that is both _training-free_ and _API-free_. MARINE effectively and efficiently reduces object hallucinations during inference by introducing image-grounded guidance to LVLMs. This is achieved by leveraging open-source vision models to extract object-level information, thereby enhancing the precision of LVLM-generated content. Our framework's flexibility further allows for the integration of multiple vision models, enabling more reliable and robust object-level guidance. Through comprehensive evaluations across popular LVLMs with diverse evaluation metrics and benchmarks, we demonstrate the effectiveness of MARINE, which even outperforms existing fine-tuning-based methods. Remarkably, it reduces hallucinations consistently in GPT-4V-assisted evaluation while maintaining the detailedness of LVLMs' generations.

Cite

Text

Zhao et al. "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance." NeurIPS 2024 Workshops: SafeGenAi, 2024.

Markdown

[Zhao et al. "Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance." NeurIPS 2024 Workshops: SafeGenAi, 2024.](https://mlanthology.org/neuripsw/2024/zhao2024neuripsw-mitigating/)

BibTeX

@inproceedings{zhao2024neuripsw-mitigating,
  title     = {{Mitigating Object Hallucination in Large Vision-Language Models via Image-Grounded Guidance}},
  author    = {Zhao, Linxi and Deng, Yihe and Zhang, Weitong and Gu, Quanquan},
  booktitle = {NeurIPS 2024 Workshops: SafeGenAi},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/zhao2024neuripsw-mitigating/}
}