Don't Label Twice: Quantity Beats Quality When Comparing Binary Classifiers on a Budget

Abstract

We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations.

Cite

Text

Dorner and Hardt. "Don't Label Twice:  Quantity Beats Quality When Comparing Binary Classifiers on a Budget." ICLR 2024 Workshops: DPFM, 2024.

Markdown

[Dorner and Hardt. "Don't Label Twice:  Quantity Beats Quality When Comparing Binary Classifiers on a Budget." ICLR 2024 Workshops: DPFM, 2024.](https://mlanthology.org/iclrw/2024/dorner2024iclrw-don/)

BibTeX

@inproceedings{dorner2024iclrw-don,
  title     = {{Don't Label Twice:  Quantity Beats Quality When Comparing Binary Classifiers on a Budget}},
  author    = {Dorner, Florian E. and Hardt, Moritz},
  booktitle = {ICLR 2024 Workshops: DPFM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/dorner2024iclrw-don/}
}