MHBench: Demystifying Motion Hallucination in VideoLLMs

Abstract

Similar to Language or Image LLMs, VideoLLMs are also plagued by hallucination issues. Hallucinations in videos not only manifest in the spatial dimension regarding the perception of the existence of visual objects (static) but also the temporal dimension influencing the perception of actions and events (dynamic). This paper introduces the concept of Motion Hallucination for the first time, exploring the hallucination phenomena caused by insufficient motion perception capabilities in VideoLMMs, as well as how to detect, evaluate, and mitigate the hallucination. To this end, we propose the first benchmark for assessing motion hallucination MHBench, which consists of 1,200 videos of 20 different action categories. By constructing a collection of adversarial triplet types of videos (original/antonym/incomplete), we achieve a comprehensive evaluation of motion hallucination. Furthermore, we present a Motion Contrastive Decoding (MotionCD) method, which employs bidirectional motion elimination between the original video and its reverse playback to construct an amateur model that removes the influence of motion while preserving visual information, thereby effectively suppressing motion hallucination. Extensive experiments on MHBench reveal that current state-of-the-art VideoLLMs significantly suffer from motion hallucination, while the introduction of MotionCD can effectively mitigate this issue, achieving up to a 15.1% performance improvement. We hope this work will guide future efforts in avoiding and mitigating hallucinations in VideoLLMs.

Cite

Text

Kong et al. "MHBench: Demystifying Motion Hallucination in VideoLLMs." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I4.32463

Markdown

[Kong et al. "MHBench: Demystifying Motion Hallucination in VideoLLMs." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/kong2025aaai-mhbench/) doi:10.1609/AAAI.V39I4.32463

BibTeX

@inproceedings{kong2025aaai-mhbench,
  title     = {{MHBench: Demystifying Motion Hallucination in VideoLLMs}},
  author    = {Kong, Ming and Zeng, Xianzhou and Chen, Luyuan and Li, Yadong and Yan, Bo and Zhu, Qiang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {4401-4409},
  doi       = {10.1609/AAAI.V39I4.32463},
  url       = {https://mlanthology.org/aaai/2025/kong2025aaai-mhbench/}
}