The Neural Testbed: Evaluating Joint Predictions

Abstract

Predictive distributions quantify uncertainties ignored by point estimates. This paper introduces The Neural Testbed: an open source benchmark for controlled and principled evaluation of agents that generate such predictions. Crucially, the testbed assesses agents not only on the quality of their marginal predictions per input, but also on their joint predictions across many inputs. We evaluate a range of agents using a simple neural network data generating process.Our results indicate that some popular Bayesian deep learning agents do not fare well with joint predictions, even when they can produce accurate marginal predictions. We also show that the quality of joint predictions drives performance in downstream decision tasks. We find these results are robust across choice a wide range of generative models, and highlight the practical importance of joint predictions to the community.

Cite

Text

Osband et al. "The Neural Testbed: Evaluating Joint Predictions." Neural Information Processing Systems, 2022.

Markdown

[Osband et al. "The Neural Testbed: Evaluating Joint Predictions." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/osband2022neurips-neural/)

BibTeX

@inproceedings{osband2022neurips-neural,
  title     = {{The Neural Testbed: Evaluating Joint Predictions}},
  author    = {Osband, Ian and Wen, Zheng and Asghari, Seyed Mohammad and Dwaracherla, Vikranth and Lu, Xiuyuan and Ibrahimi, Morteza and Lawson, Dieterich and Hao, Botao and O'Donoghue, Brendan and Van Roy, Benjamin},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/osband2022neurips-neural/}
}