SMOTE: Synthetic Minority Over-Sampling Technique

Abstract

An approach to the construction of classifiers from imbalanced datasets is described. A dataset is imbalanced if the classification categories are not approximately equally represented. Often real-world data sets are predominately composed of "normal" examples with only a small percentage of "abnormal" or "interesting" examples. It is also the case that the cost of misclassifying an abnormal (interesting) example as a normal example is often much higher than the cost of the reverse error. Under-sampling of the majority (normal) class has been proposed as a good means of increasing the sensitivity of a classifier to the minority class. This paper shows that a combination of our method of oversampling the minority (abnormal)cla ss and under-sampling the majority (normal) class can achieve better classifier performance (in ROC space)tha n only under-sampling the majority class. This paper also shows that a combination of our method of over-sampling the minority class and under-sampling the majority class can achieve better classifier performance (in ROC space)t han varying the loss ratios in Ripper or class priors in Naive Bayes. Our method of over-sampling the minority class involves creating synthetic minority class examples. Experiments are performed using C4.5, Ripper and a Naive Bayes classifier. The method is evaluated using the area under the Receiver Operating Characteristic curve (AUC)and the ROC convex hull strategy.

Cite

Text

Chawla et al. "SMOTE: Synthetic Minority Over-Sampling Technique." Journal of Artificial Intelligence Research, 2002. doi:10.1613/JAIR.953

Markdown

[Chawla et al. "SMOTE: Synthetic Minority Over-Sampling Technique." Journal of Artificial Intelligence Research, 2002.](https://mlanthology.org/jair/2002/chawla2002jair-smote/) doi:10.1613/JAIR.953

BibTeX

@article{chawla2002jair-smote,
  title     = {{SMOTE: Synthetic Minority Over-Sampling Technique}},
  author    = {Chawla, Nitesh V. and Bowyer, Kevin W. and Hall, Lawrence O. and Kegelmeyer, W. Philip},
  journal   = {Journal of Artificial Intelligence Research},
  year      = {2002},
  pages     = {321-357},
  doi       = {10.1613/JAIR.953},
  volume    = {16},
  url       = {https://mlanthology.org/jair/2002/chawla2002jair-smote/}
}