Don’t Label Twice: Quantity Beats Quality When Comparing Binary Classifiers on a Budget
Abstract
We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It’s common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it’s best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cramér’s theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding’s bound.
Cite
Text
Dorner and Hardt. "Don’t Label Twice: Quantity Beats Quality When Comparing Binary Classifiers on a Budget." International Conference on Machine Learning, 2024.Markdown
[Dorner and Hardt. "Don’t Label Twice: Quantity Beats Quality When Comparing Binary Classifiers on a Budget." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/dorner2024icml-dont/)BibTeX
@inproceedings{dorner2024icml-dont,
title = {{Don’t Label Twice: Quantity Beats Quality When Comparing Binary Classifiers on a Budget}},
author = {Dorner, Florian E. and Hardt, Moritz},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {11544-11572},
volume = {235},
url = {https://mlanthology.org/icml/2024/dorner2024icml-dont/}
}