RB-CCR: Radial-Based Combined Cleaning and Resampling Algorithm for Imbalanced Data Classification
Abstract
Real-world classification domains, such as medicine, health and safety, and finance, often exhibit imbalanced class priors and have asynchronous misclassification costs. In such cases, the classification model must achieve a high recall without significantly impacting precision. Resampling the training data is the standard approach to improving classification performance on imbalanced binary data. However, the state-of-the-art methods ignore the local joint distribution of the data or correct it as a post-processing step. This can causes sub-optimal shifts in the training distribution, particularly when the target data distribution is complex. In this paper, we propose Radial-Based Combined Cleaning and Resampling (RB-CCR). RB-CCR utilizes the concept of class potential to refine the energy-based resampling approach of CCR. In particular, RB-CCR exploits the class potential to accurately locate sub-regions of the data-space for synthetic oversampling. The category sub-region for oversampling can be specified as an input parameter to meet domain-specific needs or be automatically selected via cross-validation. Our 5×2\documentclass[12pt]minimal \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}-69pt \begin{document}$5\times 2$\end{document} cross-validated results on 57 benchmark binary datasets with 9 classifiers show that RB-CCR achieves a better precision-recall trade-off than CCR and generally out-performs the state-of-the-art resampling methods in terms of AUC and G-mean.
Cite
Text
Koziarski et al. "RB-CCR: Radial-Based Combined Cleaning and Resampling Algorithm for Imbalanced Data Classification." Machine Learning, 2021. doi:10.1007/S10994-021-06012-8Markdown
[Koziarski et al. "RB-CCR: Radial-Based Combined Cleaning and Resampling Algorithm for Imbalanced Data Classification." Machine Learning, 2021.](https://mlanthology.org/mlj/2021/koziarski2021mlj-rbccr/) doi:10.1007/S10994-021-06012-8BibTeX
@article{koziarski2021mlj-rbccr,
title = {{RB-CCR: Radial-Based Combined Cleaning and Resampling Algorithm for Imbalanced Data Classification}},
author = {Koziarski, Michal and Bellinger, Colin and Wozniak, Michal},
journal = {Machine Learning},
year = {2021},
pages = {3059-3093},
doi = {10.1007/S10994-021-06012-8},
volume = {110},
url = {https://mlanthology.org/mlj/2021/koziarski2021mlj-rbccr/}
}