Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks

Abstract

Large language models (LLMs) show remarkable promise for democratizing automated reasoning by generating formal specifications. However, a fundamental tension exists: LLMs are probabilistic, while formal verification demands deterministic guarantees. This paper addresses this epistemological gap by comprehensively investigating failure modes and uncertainty quantification (UQ) in LLM-generated formal artifacts. Our systematic evaluation of five frontier LLMs reveals Satisfiability Modulo Theories (SMT) based autoformalization's domain-specific impact on accuracy (from +34.8\% on logical tasks to -44.5\% on factual ones), with known UQ techniques like the entropy of token probabilities failing to identify these errors. We introduce a probabilistic context-free grammar (PCFG) framework to model LLM outputs, yielding a refined uncertainty taxonomy. We find uncertainty signals are task-dependent (e.g., grammar entropy for logic, AUROC>0.93). Finally, a lightweight fusion of these signals enables selective verification, drastically reducing errors (14-100\%) with minimal abstention, transforming LLM-driven formalization into a reliable engineering discipline.

Cite

Text

Ganguly et al. "Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks." Advances in Neural Information Processing Systems, 2025.

Markdown

[Ganguly et al. "Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/ganguly2025neurips-grammars/)

BibTeX

@inproceedings{ganguly2025neurips-grammars,
  title     = {{Grammars of Formal Uncertainty: When to Trust LLMs in Automated Reasoning Tasks}},
  author    = {Ganguly, Debargha and Singh, Vikash and Sankar, Sreehari and Zhang, Biyao and Zhang, Xuecen and Iyengar, Srinivasan and Han, Xiaotian and Sharma, Amit and Kalyanaraman, Shivkumar and Chaudhary, Vipin},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/ganguly2025neurips-grammars/}
}