Consistency Checks for Language Model Forecasters
Abstract
Forecasting is a task that is difficult to evaluate: the ground truth can only be known in the future. Recent work showing LLM forecasters rapidly approaching human-level performance begs the question: how can we benchmark and evaluate these forecasters *instantaneously*? Following the consistency check framework, we measure the performance of forecasters in terms of the consistency of their predictions on different logically-related questions. We propose a new, general consistency metric based on *arbitrage*: for example, if a forecasting AI illogically predicts that both the Democratic and Republican parties have 60\% probability of winning the 2024 US presidential election, an arbitrageur could trade against the forecaster's predictions and make a profit. We build an automated evaluation system that generates a set of base questions, instantiates consistency checks from these questions, elicits the predictions of the forecaster, and measures the consistency of the predictions. We then build a standard, proper-scoring-rule forecasting benchmark, and show that our (instantaneous) consistency metrics correlate strongly with LLM forecasters' ground truth Brier scores (which are only known in the future). We also release a consistency benchmark that resolves in 2028, providing a long-term evaluation tool for forecasting.
Cite
Text
Paleka et al. "Consistency Checks for Language Model Forecasters." International Conference on Learning Representations, 2025.Markdown
[Paleka et al. "Consistency Checks for Language Model Forecasters." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/paleka2025iclr-consistency/)BibTeX
@inproceedings{paleka2025iclr-consistency,
title = {{Consistency Checks for Language Model Forecasters}},
author = {Paleka, Daniel and Sudhir, Abhimanyu Pallavi and Alvarez, Alejandro and Bhat, Vineeth and Shen, Adam and Wang, Evan and Tramèr, Florian},
booktitle = {International Conference on Learning Representations},
year = {2025},
url = {https://mlanthology.org/iclr/2025/paleka2025iclr-consistency/}
}