How Many Opinions Does Your LLM Have? Improving Uncertainty Estimation in NLG

Abstract

Large language models (LLMs) suffer from hallucination, where they generate text that is not factual. Hallucinations impede many applications of LLMs in society and industry as they make LLMs untrustworthy. It has been suggested that hallucinations result from predictive uncertainty. If an LLM is uncertain about the semantic meaning it should generate next, it is likely to start hallucinating. We introduce Semantic-Diverse Language Generation (SDLG) to quantify predictive uncertainty of LLMs. Our method detects if a generated text is hallucinated by offering a precise measure of aleatoric semantic uncertainty. Experiments demonstrate that SDLG consistently outperforms existing methods while being computationally the most efficient, setting a new standard for uncertainty estimation in NLG.

Cite

Text

Aichberger et al. "How Many Opinions Does Your LLM Have? Improving Uncertainty Estimation in NLG." ICLR 2024 Workshops: R2-FM, 2024.

Markdown

[Aichberger et al. "How Many Opinions Does Your LLM Have? Improving Uncertainty Estimation in NLG." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/aichberger2024iclrw-many/)

BibTeX

@inproceedings{aichberger2024iclrw-many,
  title     = {{How Many Opinions Does Your LLM Have? Improving Uncertainty Estimation in NLG}},
  author    = {Aichberger, Lukas and Schweighofer, Kajetan and Ielanskyi, Mykyta and Hochreiter, Sepp},
  booktitle = {ICLR 2024 Workshops: R2-FM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/aichberger2024iclrw-many/}
}