LLM-Check: Investigating Detection of Hallucinations in Large Language Models

Abstract

While Large Language Models (LLMs) have become immensely popular due to their outstanding performance on a broad range of tasks, these models are prone to producing hallucinations— outputs that are fallacious or fabricated yet often appear plausible or tenable at a glance. In this paper, we conduct a comprehensive investigation into the nature of hallucinations within LLMs and furthermore explore effective techniques for detecting such inaccuracies in various real-world settings. Prior approaches to detect hallucinations in LLM outputs, such as consistency checks or retrieval-based methods, typically assume access to multiple model responses or large databases. These techniques, however, tend to be computationally expensive in practice, thereby limiting their applicability to real-time analysis. In contrast, in this work, we seek to identify hallucinations within a single response in both white-box and black-box settings by analyzing the internal hidden states, attention maps, and output prediction probabilities of an auxiliary LLM. In addition, we also study hallucination detection in scenarios where ground-truth references are also available, such as in the setting of Retrieval-Augmented Generation (RAG). We demonstrate that the proposed detection methods are extremely compute-efficient, with speedups of up to 45x and 450x over other baselines, while achieving significant improvements in detection performance over diverse datasets.

Cite

Text

Sriramanan et al. "LLM-Check: Investigating Detection of Hallucinations in Large Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-1077

Markdown

[Sriramanan et al. "LLM-Check: Investigating Detection of Hallucinations in Large Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/sriramanan2024neurips-llmcheck/) doi:10.52202/079017-1077

BibTeX

@inproceedings{sriramanan2024neurips-llmcheck,
  title     = {{LLM-Check: Investigating Detection of Hallucinations in Large Language Models}},
  author    = {Sriramanan, Gaurang and Bharti, Siddhant and Sadasivan, Vinu Sankar and Saha, Shoumik and Kattakinda, Priyatham and Feizi, Soheil},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-1077},
  url       = {https://mlanthology.org/neurips/2024/sriramanan2024neurips-llmcheck/}
}