Evaluating Classifiers by Means of Test Data with Noisy Labels
Abstract
Often the most expensive and time-consuming task in building a pattern recognition system is collecting and accurately labeling training and testing data. In this paper, we explore the use of inexpensive noisy testing data for evaluating a classifier's performance. We assume 1) the (human) labeler provides category labels with a known mislabeling rate and 2) the trained classifier and the labeler are statistically independent. We then derive the number of "noisy " test samples that arc, on average, equivalent to a single perfectly labeled test sample for the task of evaluating the classifier's performance. For practical and realistic error and mislabeling rates, this number of equivalent test patterns can be surprisingly low. We also derive an upper and lower bound for the true error rate when the labeler and the classifier are not independent. 1
Cite
Text
Lam and Stork. "Evaluating Classifiers by Means of Test Data with Noisy Labels." International Joint Conference on Artificial Intelligence, 2003.Markdown
[Lam and Stork. "Evaluating Classifiers by Means of Test Data with Noisy Labels." International Joint Conference on Artificial Intelligence, 2003.](https://mlanthology.org/ijcai/2003/lam2003ijcai-evaluating/)BibTeX
@inproceedings{lam2003ijcai-evaluating,
title = {{Evaluating Classifiers by Means of Test Data with Noisy Labels}},
author = {Lam, Chuck P. and Stork, David G.},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2003},
pages = {513-518},
url = {https://mlanthology.org/ijcai/2003/lam2003ijcai-evaluating/}
}