On Subjective Uncertainty Quantification and Calibration in Natural Language Generation

Abstract

Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specific uncertainties (e.g., about the semantics) which appears difficult to define in general cases. This work addresses these challenges from a perspective of Bayesian decision theory, starting from the assumption that our utility is characterized by a similarity measure that compares a generated response with a hypothetical true response. We discuss how this assumption enables principled quantification of the model’s subjective uncertainty and its calibration. We further derive a measure for epistemic uncertainty, based on a missing data perspective and its characterization as an excess risk. The proposed methods can be applied to black-box language models. We illustrate the methods on question answering and machine translation tasks. Our experiments provide a principled evaluation of task-specific calibration, and demonstrate that epistemic uncertainty offers a promising deferral strategy for efficient data acquisition in in-context learning.

Cite

Text

Wang and Holmes. "On Subjective Uncertainty Quantification and Calibration in Natural Language Generation." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.

Markdown

[Wang and Holmes. "On Subjective Uncertainty Quantification and Calibration in Natural Language Generation." Proceedings of The 28th International Conference on Artificial Intelligence and Statistics, 2025.](https://mlanthology.org/aistats/2025/wang2025aistats-subjective/)

BibTeX

@inproceedings{wang2025aistats-subjective,
  title     = {{On Subjective Uncertainty Quantification and Calibration in Natural Language Generation}},
  author    = {Wang, Ziyu and Holmes, Christopher C.},
  booktitle = {Proceedings of The 28th International Conference on Artificial Intelligence and Statistics},
  year      = {2025},
  pages     = {3799-3807},
  volume    = {258},
  url       = {https://mlanthology.org/aistats/2025/wang2025aistats-subjective/}
}