Severe Class Imbalance: Why Better Algorithms Aren't the Answer

Drummond, Chris; Holte, Robert C.

doi:10.1007/11564096_52

Severe Class Imbalance: Why Better Algorithms Aren't the Answer

Chris Drummond, Robert C. Holte

ECML-PKDD 2005 pp. 539-546

doi:10.1007/11564096_52 /ecmlpkdd/2005/drummond2005ecml-severe/

Abstract

This paper argues that severe class imbalance is not just an interesting technical challenge that improved learning algorithms will address, it is much more serious. To be useful, a classifier must appreciably outperform a trivial solution, such as choosing the majority class. Any application that is inherently noisy limits the error rate, and cost, that is achievable. When data are normally distributed, even a Bayes optimal classifier has a vanishingly small reduction in the majority classifier’s error rate, and cost, as imbalance increases. For fat tailed distributions, and when practical classifiers are used, often no reduction is achieved.

PDF ECML-PKDD Semantic Scholar

Cite

Text

Drummond and Holte. "Severe Class Imbalance: Why Better Algorithms Aren't the Answer." European Conference on Machine Learning, 2005. doi:10.1007/11564096_52

Markdown

[Drummond and Holte. "Severe Class Imbalance: Why Better Algorithms Aren't the Answer." European Conference on Machine Learning, 2005.](https://mlanthology.org/ecmlpkdd/2005/drummond2005ecml-severe/) doi:10.1007/11564096_52

BibTeX

@inproceedings{drummond2005ecml-severe,
  title     = {{Severe Class Imbalance: Why Better Algorithms Aren't the Answer}},
  author    = {Drummond, Chris and Holte, Robert C.},
  booktitle = {European Conference on Machine Learning},
  year      = {2005},
  pages     = {539-546},
  doi       = {10.1007/11564096_52},
  url       = {https://mlanthology.org/ecmlpkdd/2005/drummond2005ecml-severe/}
}