GMMSampling: A New Model-Based, Data Difficulty-Driven Resampling Method for Multi-Class Imbalanced Data
Abstract
Learning from multi-class imbalanced data has still received limited research attention. Most of the proposed methods focus on the global class imbalance ratio only. In contrast, experimental studies demonstrated that the imbalance ratio itself is not the main difficulty in the imbalanced learning. It is the combination of the imbalance ratio with other data difficulty factors, such as class overlapping or minority class decomposition into various subconcepts, that significantly affects the classification performance. This paper presents GMMSampling—a new resampling method that exploits information about data difficulty factors to clear class overlapping regions from majority class instances and to simultaneously oversample each subconcept of the minority class. The experimental evaluation demonstrated that the proposed method achieves better results in terms of G-mean, balanced accuracy, macro-AP, MCC and F-score than other related methods.
Cite
Text
Naglik and Lango. "GMMSampling: A New Model-Based, Data Difficulty-Driven Resampling Method for Multi-Class Imbalanced Data." Machine Learning, 2024. doi:10.1007/S10994-023-06416-8Markdown
[Naglik and Lango. "GMMSampling: A New Model-Based, Data Difficulty-Driven Resampling Method for Multi-Class Imbalanced Data." Machine Learning, 2024.](https://mlanthology.org/mlj/2024/naglik2024mlj-gmmsampling/) doi:10.1007/S10994-023-06416-8BibTeX
@article{naglik2024mlj-gmmsampling,
title = {{GMMSampling: A New Model-Based, Data Difficulty-Driven Resampling Method for Multi-Class Imbalanced Data}},
author = {Naglik, Iwo and Lango, Mateusz},
journal = {Machine Learning},
year = {2024},
pages = {5183-5202},
doi = {10.1007/S10994-023-06416-8},
volume = {113},
url = {https://mlanthology.org/mlj/2024/naglik2024mlj-gmmsampling/}
}