Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning Under Distribution Shift

Abstract

Bayesian deep learning (BDL) is a promising approach to achieve well-calibrated predictions on distribution-shifted data. Nevertheless, there exists no large-scale survey that evaluates recent SOTA methods on diverse, realistic, and challenging benchmark tasks in a systematic manner. To provide a clear picture of the current state of BDL research, we evaluate modern BDL algorithms on real-world datasets from the WILDS collection containing challenging classification and regression tasks, with a focus on generalization capability and calibration under distribution shift. We compare the algorithms on a wide range of large, convolutional and transformer-based neural network architectures. In particular, we investigate a signed version of the expected calibration error that reveals whether the methods are over- or underconfident, providing further insight into the behavior of the methods. Further, we provide the first systematic evaluation of BDL for fine-tuning large pre-trained models, where training from scratch is prohibitively expensive. Finally, given the recent success of Deep Ensembles, we extend popular single-mode posterior approximations to multiple modes by the use of ensembles. While we find that ensembling single-mode approximations generally improves the generalization capability and calibration of the models by a significant margin, we also identify a failure mode of ensembles when finetuning large transformer-based language models. In this setting, variational inference based approaches such as last-layer Bayes By Backprop outperform other methods in terms of accuracy by a large margin, while modern approximate inference algorithms such as SWAG achieve the best calibration.

Cite

Text

Seligmann et al. "Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning Under Distribution Shift." Neural Information Processing Systems, 2023.

Markdown

[Seligmann et al. "Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning Under Distribution Shift." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/seligmann2023neurips-beyond/)

BibTeX

@inproceedings{seligmann2023neurips-beyond,
  title     = {{Beyond Deep Ensembles: A Large-Scale Evaluation of Bayesian Deep Learning Under Distribution Shift}},
  author    = {Seligmann, Florian and Becker, Philipp and Volpp, Michael and Neumann, Gerhard},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/seligmann2023neurips-beyond/}
}