Reliability Benchmarks for Image Segmentation

Abstract

Recent work has shown the importance of reliability, where model performance is assessed under stress conditions pervasive in real-world deployment. In this work, we examine reliability tasks in the setting of semantic segmentation, a dense output problem that has typically only been evaluated using in-distribution predictive performance---for example, the mean intersection over union score on the Cityscapes validation set. To reduce the gap toward reliable deployment in the real world, we compile a benchmark involving existing (and newly constructed) distribution shifts and metrics. We evaluate current models and several baselines to determine how well segmentation models make robust predictions across multiple types of distribution shift and flag when they don’t know.

Cite

Text

Buchanan et al. "Reliability Benchmarks for Image Segmentation." NeurIPS 2022 Workshops: DistShift, 2022.

Markdown

[Buchanan et al. "Reliability Benchmarks for Image Segmentation." NeurIPS 2022 Workshops: DistShift, 2022.](https://mlanthology.org/neuripsw/2022/buchanan2022neuripsw-reliability/)

BibTeX

@inproceedings{buchanan2022neuripsw-reliability,
  title     = {{Reliability Benchmarks for Image Segmentation}},
  author    = {Buchanan, E. Kelly and Dusenberry, Michael W and Ren, Jie and Murphy, Kevin Patrick and Lakshminarayanan, Balaji and Tran, Dustin},
  booktitle = {NeurIPS 2022 Workshops: DistShift},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/buchanan2022neuripsw-reliability/}
}