Epistemic Integrity in Large Language Models
Abstract
Large language models are increasingly relied upon as sources of information, but their propensity for generating false or misleading statements with high confidence poses risks for users and society. In this paper, we confront the critical problem of epistemic miscalibration—where a model's linguistic assertiveness fails to reflect its true internal certainty. We introduce a new human-labeled dataset and a novel method for measuring the linguistic assertiveness of Large Language Models which cuts error rates by over 50\% relative to previous benchmarks. Validated across multiple datasets, our method reveals a stark misalignment between how confidently models linguistically present information and their actual accuracy. Further human evaluations confirm the severity of this miscalibration. This evidence underscores the urgent risk of the overstated certainty Large Language Models hold which may mislead users on a massive scale. Our framework provides a crucial step forward in diagnosing and correcting this miscalibration, offering a path to safer and more trustworthy AI across domains.
Cite
Text
Ghafouri et al. "Epistemic Integrity in Large Language Models." NeurIPS 2024 Workshops: SafeGenAi, 2024.Markdown
[Ghafouri et al. "Epistemic Integrity in Large Language Models." NeurIPS 2024 Workshops: SafeGenAi, 2024.](https://mlanthology.org/neuripsw/2024/ghafouri2024neuripsw-epistemic/)BibTeX
@inproceedings{ghafouri2024neuripsw-epistemic,
title = {{Epistemic Integrity in Large Language Models}},
author = {Ghafouri, Bijean and Mohammadzadeh, Shahrad and Zhou, James and Nair, Pratheeksha and Tian, Jacob-Junqi and Goel, Mayank and Rabbany, Reihaneh and Godbout, Jean-François and Pelrine, Kellin},
booktitle = {NeurIPS 2024 Workshops: SafeGenAi},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/ghafouri2024neuripsw-epistemic/}
}