A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

Abstract

How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semisupervised Performance Evaluation (SPE), is based on a generative model for the classifier's confidence scores. In addition to estimating the performance of classifiers on new datasets, SPE can be used to recalibrate a classifier by reestimating the class-conditional confidence distributions.

Cite

Text

Welinder et al. "A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration." Conference on Computer Vision and Pattern Recognition, 2013.

Markdown

[Welinder et al. "A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration." Conference on Computer Vision and Pattern Recognition, 2013.](https://mlanthology.org/cvpr/2013/welinder2013cvpr-lazy/)

BibTeX

@inproceedings{welinder2013cvpr-lazy,
  title     = {{A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration}},
  author    = {Welinder, Peter and Welling, Max and Perona, Pietro},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2013},
  url       = {https://mlanthology.org/cvpr/2013/welinder2013cvpr-lazy/}
}