Forecast Aggregation via Recalibration
Abstract
It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact that individuals exhibit systematic biases during judgment and elicitation. All of the models recalibrate judgments or mean judgments via a two-parameter calibration function, and differ in terms of whether (1) the calibration function is applied before or after the averaging, (2) averaging is done in probability or log-odds space, and (3) individual differences are captured via hierarchical modeling. Of the non-hierarchical models, the one that first recalibrates the individual judgments and then averages them in log-odds is the best relative to simple averaging, with 26.7 % improvement in Brier score and better performance on 86 % of the individual problems. The hierarchical version of this model does slightly better in terms of mean Brier score (28.2 %) and slightly worse in terms of individual problems (85 %).
Cite
Text
Turner et al. "Forecast Aggregation via Recalibration." Machine Learning, 2014. doi:10.1007/S10994-013-5401-4Markdown
[Turner et al. "Forecast Aggregation via Recalibration." Machine Learning, 2014.](https://mlanthology.org/mlj/2014/turner2014mlj-forecast/) doi:10.1007/S10994-013-5401-4BibTeX
@article{turner2014mlj-forecast,
title = {{Forecast Aggregation via Recalibration}},
author = {Turner, Brandon M. and Steyvers, Mark and Merkle, Edgar C. and Budescu, David V. and Wallsten, Thomas S.},
journal = {Machine Learning},
year = {2014},
pages = {261-289},
doi = {10.1007/S10994-013-5401-4},
volume = {95},
url = {https://mlanthology.org/mlj/2014/turner2014mlj-forecast/}
}