Position Paper on Diagnostic Uncertainty Estimation from Large Language Models: Next-Word Probability Is Not Pre-Test Probability

Abstract

Large language models (LLMs) are being explored for diagnostic decision support, yet their ability to estimate pre-test probabilities, vital for clinical decision-making, remains limited. This study evaluates two LLMs, Mistral-7B and Llama3-70B, using structured electronic health record data on three diagnosis tasks. We examined three current methods of extracting LLM probability estimations and revealed their limitations. We aim to highlight the need for improved techniques in LLM confidence estimation.

Cite

Text

Gao et al. "Position Paper on Diagnostic Uncertainty Estimation from Large Language Models:  Next-Word Probability Is Not Pre-Test Probability." NeurIPS 2024 Workshops: GenAI4Health, 2024.

Markdown

[Gao et al. "Position Paper on Diagnostic Uncertainty Estimation from Large Language Models:  Next-Word Probability Is Not Pre-Test Probability." NeurIPS 2024 Workshops: GenAI4Health, 2024.](https://mlanthology.org/neuripsw/2024/gao2024neuripsw-position/)

BibTeX

@inproceedings{gao2024neuripsw-position,
  title     = {{Position Paper on Diagnostic Uncertainty Estimation from Large Language Models:  Next-Word Probability Is Not Pre-Test Probability}},
  author    = {Gao, Yanjun and Myers, Skatje and Chen, Shan and Dligach, Dmitriy and Miller, Timothy A and Bitterman, Danielle and Chen, Guanhua and Mayampurath, Anoop and Churpek, Matthew and Afshar, Majid},
  booktitle = {NeurIPS 2024 Workshops: GenAI4Health},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/gao2024neuripsw-position/}
}