Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets

Abstract

The class imbalance problem causes a classifier to over-fit the data belonging to the class with the greatest number of training examples. The purpose of this paper is to argue that methods that equalize class membership are not as effective as possible when applied blindly and that improvements can be obtained by adjusting for the within-class imbalance. A guided resampling technique is proposed and tested within a simpler letter recognition domain and a more difficult text classification domain. A fast unsupervised clustering technique, Principal Direction Divisive Partitioning (PDDP), is used to determine the internal characteristics of each class. The performance improvement in categories that suffer from a large between-class imbalance (few positive examples) are shown to be improved when using the guided resampling method.

Cite

Text

Nickerson et al. "Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.

Markdown

[Nickerson et al. "Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets." Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, 2001.](https://mlanthology.org/aistats/2001/nickerson2001aistats-using/)

BibTeX

@inproceedings{nickerson2001aistats-using,
  title     = {{Using Unsupervised Learning to Guide Resampling in Imbalanced Data Sets}},
  author    = {Nickerson, Adam and Japkowicz, Nathalie and Milios, Evangelos E.},
  booktitle = {Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics},
  year      = {2001},
  pages     = {224-228},
  volume    = {R3},
  url       = {https://mlanthology.org/aistats/2001/nickerson2001aistats-using/}
}