Overcoming Common Flaws in the Evaluation of Selective Classification Systems

Abstract

Selective Classification, wherein models can reject low-confidence predictions, promises reliable translation of machine-learning based classification systems to real-world scenarios such as clinical diagnostics. While current evaluation of these systems typically assumes fixed working points based on pre-defined rejection thresholds, methodological progress requires benchmarking the general performance of systems akin to the $\mathrm{AUROC}$ in standard classification. In this work, we define 5 requirements for multi-threshold metrics in selective classification regarding task alignment, interpretability, and flexibility, and show how current approaches fail to meet them. We propose the Area under the Generalized Risk Coverage curve ($\mathrm{AUGRC}$), which meets all requirements and can be directly interpreted as the average risk of undetected failures. We empirically demonstrate the relevance of $\mathrm{AUGRC}$ on a comprehensive benchmark spanning 6 data sets and 13 confidence scoring functions. We find that the proposed metric substantially changes metric rankings on 5 out of the 6 data sets.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Traub et al. "Overcoming Common Flaws in the Evaluation of Selective Classification Systems." Neural Information Processing Systems, 2024. doi:10.52202/079017-0076

Markdown

[Traub et al. "Overcoming Common Flaws in the Evaluation of Selective Classification Systems." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/traub2024neurips-overcoming/) doi:10.52202/079017-0076

BibTeX

@inproceedings{traub2024neurips-overcoming,
  title     = {{Overcoming Common Flaws in the Evaluation of Selective Classification Systems}},
  author    = {Traub, Jeremias and Bungert, Till J. and Lüth, Carsten T. and Baumgartner, Michael and Maier-Hein, Klaus H. and Maier-Hein, Lena and Jäger, Paul F.},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0076},
  url       = {https://mlanthology.org/neurips/2024/traub2024neurips-overcoming/}
}