Forecast Aggregation via Recalibration

Abstract

It is known that the average of many forecasts about a future event tends to outperform the individual assessments. With the goal of further improving forecast performance, this paper develops and compares a number of models for calibrating and aggregating forecasts that exploit the well-known fact that individuals exhibit systematic biases during judgment and elicitation. All of the models recalibrate judgments or mean judgments via a two-parameter calibration function, and differ in terms of whether (1) the calibration function is applied before or after the averaging, (2) averaging is done in probability or log-odds space, and (3) individual differences are captured via hierarchical modeling. Of the non-hierarchical models, the one that first recalibrates the individual judgments and then averages them in log-odds is the best relative to simple averaging, with 26.7 % improvement in Brier score and better performance on 86 % of the individual problems. The hierarchical version of this model does slightly better in terms of mean Brier score (28.2 %) and slightly worse in terms of individual problems (85 %).

Cite

Text

Turner et al. "Forecast Aggregation via Recalibration." Machine Learning, 2014. doi:10.1007/S10994-013-5401-4

Markdown

[Turner et al. "Forecast Aggregation via Recalibration." Machine Learning, 2014.](https://mlanthology.org/mlj/2014/turner2014mlj-forecast/) doi:10.1007/S10994-013-5401-4

BibTeX

@article{turner2014mlj-forecast,
  title     = {{Forecast Aggregation via Recalibration}},
  author    = {Turner, Brandon M. and Steyvers, Mark and Merkle, Edgar C. and Budescu, David V. and Wallsten, Thomas S.},
  journal   = {Machine Learning},
  year      = {2014},
  pages     = {261-289},
  doi       = {10.1007/S10994-013-5401-4},
  volume    = {95},
  url       = {https://mlanthology.org/mlj/2014/turner2014mlj-forecast/}
}