Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis

Abstract

Large language models (LLMs) have shown exceptional performance across various domains. However, LLMs are prone to hallucinate facts and generate non-factual responses, which can undermine their reliability in real-world applications. Current hallucination detection methods suffer from external resource demands, substantial time overhead, difficulty overcoming LLMs' intrinsic limitation, and insufficient modeling. In this paper, we propose MHAD, a novel internal-representation-based hallucination detection method. MHAD utilizes linear probing to select neurons and layers within LLMs. The selected neurons and layers are demonstrated with significant awareness of hallucinations at the initial and final generation steps. By concatenating the outputs from these selected neurons of selected layers at the initial and final generation steps, a hallucination awareness vector is formed, enabling precise hallucination detection via an MLP. Additionally, we introduce SOQHD, a novel benchmark for evaluating hallucination detection in Open-Domain QA (ODQA). Extensive experiments show that MHAD outperforms existing hallucination detection methods across multiple LLMs, demonstrating superior effectiveness.

Cite

Text

Zhang et al. "Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis." International Joint Conference on Artificial Intelligence, 2025. doi:10.24963/IJCAI.2025/929

Markdown

[Zhang et al. "Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis." International Joint Conference on Artificial Intelligence, 2025.](https://mlanthology.org/ijcai/2025/zhang2025ijcai-detecting/) doi:10.24963/IJCAI.2025/929

BibTeX

@inproceedings{zhang2025ijcai-detecting,
  title     = {{Detecting Hallucination in Large Language Models Through Deep Internal Representation Analysis}},
  author    = {Zhang, Luan and Song, Dandan and Wu, Zhijing and Tian, Yuhang and Zhou, Changzhi and Xu, Jing and Yang, Ziyi and Zhang, Shuhao},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8357-8365},
  doi       = {10.24963/IJCAI.2025/929},
  url       = {https://mlanthology.org/ijcai/2025/zhang2025ijcai-detecting/}
}