Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-Judge

Abstract

This paper explores generalised probabilistic modelling and uncertainty estimation in comparative LLM-as-a-judge frameworks. We show that existing Product-of-Experts methods are specific cases of a broader framework, enabling diverse modelling options. Furthermore, we propose improved uncertainty estimates for individual comparisons, enabling more efficient selection and achieving strong performance with fewer evaluations. We also introduce a method for estimating overall ranking uncertainty. Finally, we demonstrate that combining absolute and comparative scoring improves performance. Experiments show that the specific expert model has a limited impact on final rankings but our proposed uncertainty estimates, especially the probability of reordering, significantly improve the efficiency of systems reducing the number of needed comparisons by $\sim$50%. Furthermore, ranking-level uncertainty metrics can be used to identify low-performing predictions, where the nature of the probabilistic model has a notable impact on the quality of the overall uncertainty.

Cite

Text

Fathullah and Gales. "Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-Judge." Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, 2025.

Markdown

[Fathullah and Gales. "Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-Judge." Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence, 2025.](https://mlanthology.org/uai/2025/fathullah2025uai-generalised/)

BibTeX

@inproceedings{fathullah2025uai-generalised,
  title     = {{Generalised Probabilistic Modelling and Improved Uncertainty Estimation in Comparative LLM-as-a-Judge}},
  author    = {Fathullah, Yassir and Gales, Mark},
  booktitle = {Proceedings of the Forty-first Conference on Uncertainty in Artificial Intelligence},
  year      = {2025},
  pages     = {1266-1288},
  volume    = {286},
  url       = {https://mlanthology.org/uai/2025/fathullah2025uai-generalised/}
}