Probabilistic Reasoning with LLMs for Privacy Risk Estimation

Abstract

Probabilistic reasoning is a key aspect of both human and artificial intelligence that allows for handling uncertainty and ambiguity in decision-making. In this paper, we introduce a new numerical reasoning task under uncertainty for large language models, focusing on estimating the privacy risk of user-generated documents containing privacy-sensitive information. We propose BRANCH, a new LLM methodology that estimates the $k$-privacy value of a text—the size of the population matching the given information. BRANCH factorizes a joint probability distribution of personal information as random variables. The probability of each factor in a population is estimated separately using a Bayesian network and combined to compute the final $k$-value. Our experiments show that this method successfully estimates the $k$-value 73% of the time, a 13% increase compared to o3-mini with chain-of-thought reasoning. We also find that LLM uncertainty is a good indicator for accuracy, as high variance predictions are 37.47% less accurate on average.

Cite

Text

Zheng et al. "Probabilistic Reasoning with LLMs for Privacy Risk Estimation." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zheng et al. "Probabilistic Reasoning with LLMs for Privacy Risk Estimation." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zheng2025neurips-probabilistic/)

BibTeX

@inproceedings{zheng2025neurips-probabilistic,
  title     = {{Probabilistic Reasoning with LLMs for Privacy Risk Estimation}},
  author    = {Zheng, Jonathan and Ritter, Alan and Das, Sauvik and Xu, Wei},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zheng2025neurips-probabilistic/}
}