ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models

Abstract

Recent Large Vision-Language Models (LVLMs) have introduced a new paradigm for understanding and reasoning about image input through textual responses. Although they have achieved remarkable performance across a range of multi-modal tasks, they face the persistent challenge of hallucination, which introduces practical weaknesses and raises concerns about their reliable deployment in real-world applications. Existing work has explored contrastive decoding approaches to mitigate this issue, where the output of the original LVLM is compared and contrasted with that of a perturbed version. However, these methods require two or more queries that slow down LVLM response generation, making them less suitable for real-time applications. To overcome this limitation, we propose ONLY, a training-free decoding approach that requires only a single query and a one-layer intervention during decoding, enabling efficient real-time deployment. Specifically, we enhance textual outputs by selectively amplifying crucial textual information using a text-to-visual entropy ratio for each token. Extensive experimental results demonstrate that our ONLY approach consistently outperforms state-of-the-art methods across various benchmarks while requiring minimal implementation effort and computational cost. Code is available at https://github.com/zifuwan/ONLY.

Cite

Text

Wan et al. "ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models." International Conference on Computer Vision, 2025.

Markdown

[Wan et al. "ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/wan2025iccv-only/)

BibTeX

@inproceedings{wan2025iccv-only,
  title     = {{ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models}},
  author    = {Wan, Zifu and Zhang, Ce and Yong, Silong and Ma, Martin Q. and Stepputtis, Simon and Morency, Louis-Philippe and Ramanan, Deva and Sycara, Katia and Xie, Yaqi},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {3225-3234},
  url       = {https://mlanthology.org/iccv/2025/wan2025iccv-only/}
}