Identifying and Eliminating Mislabeled Training Instances

Abstract

This paper presents a new approach to identifying and eliminating mislabeled training instances. The goal of this technique is to improve classification accuracies produced by learning algorithms by improving the quality of the training data. The approach employs an ensemble of classifiers that serve as a filter for the training data. Using an n-fold cross validation, the training data is passed through the filter. Only instances that the filter classifies correctly are passed to the final learning algorithm. We present an empirical evaluation of the approach for the task of automated land cover mapping from remotely sensed data. Labeling error arises in these data from a multitude of sources including lack of consistency in the vegetation classification used, variable measurement techniques, and variation in the spatial sampling resolution. Our evaluation shows that for noise levels of less than 40%, filtering results in higher predictive accuracy than not filtering, and for levels of...

Cite

Text

Brodley and Friedl. "Identifying and Eliminating Mislabeled Training Instances." AAAI Conference on Artificial Intelligence, 1996.

Markdown

[Brodley and Friedl. "Identifying and Eliminating Mislabeled Training Instances." AAAI Conference on Artificial Intelligence, 1996.](https://mlanthology.org/aaai/1996/brodley1996aaai-identifying/)

BibTeX

@inproceedings{brodley1996aaai-identifying,
  title     = {{Identifying and Eliminating Mislabeled Training Instances}},
  author    = {Brodley, Carla E. and Friedl, Mark A.},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1996},
  pages     = {799-805},
  url       = {https://mlanthology.org/aaai/1996/brodley1996aaai-identifying/}
}