GBRF: A Novel Framework for Encoding User-Preferences in Imbalanced Data Distributions via Genetic Optimization

Carvalho, Miguel; Pinho, Armando J.; Brás, Susana

doi:10.1007/978-3-032-06096-9_4

GBRF: A Novel Framework for Encoding User-Preferences in Imbalanced Data Distributions via Genetic Optimization

Miguel Carvalho, Armando J. Pinho, Susana Brás

ECML-PKDD 2025 pp. 59-77

doi:10.1007/978-3-032-06096-9_4 /ecmlpkdd/2025/carvalho2025ecmlpkdd-gbrf/

Abstract

Resampling techniques are widely used by researchers and practitioners to address class imbalance due to their adaptability across diverse classification tasks. However, they inherently lack the ability to enforce user-defined preferences regarding model behavior after training, a feature typically exclusive to cost-sensitive learning frameworks or prediction post-processing techniques. This limitation is particularly critical in high-stakes applications, such as in the medical domain, where maximizing minority class accuracy while minimizing false negatives is essential. To overcome this constraint, we introduce the Genetic Beta Resampling Framework (GBRF), a novel, customizable and computationally efficient resampling framework that integrates user preferences into the process of synthetic data generation. GBRF leverages Genetic Algorithms to optimize two probability mass functions (PMFs) that govern the sampling probabilities of different instance groups, enabling synthetic data generation and/or instance removal. Consequently, GBRF can function as a hybrid sampling, oversampling or undersampling technique. User preferences are encoded through a parameter, $\beta $ β , which controls the trade-off between precision and recall. Comprehensive experiments on 60 OpenML datasets demonstrate that GBRF effectively embeds user preferences into data distributions, thus shaping model behavior accordingly. It consistently outperforms state-of-the-art resampling techniques, such as SMOTE-IPF and ProWSyn, as well as cost-sensitive classifiers, even when integrated with various classification models. Furthermore, by employing a non-instance-wise genetic optimization approach, GBRF significantly reduces the search space, achieving faster convergence to optimal solutions. Finally, since synthetic data generation is governed by two PMFs, GBRF provides an intuitive and transparent mechanism for understanding how data is generated. Code available at: https://github.com/MiguelCarvalhoPhD/GBRF .

PDF ECML-PKDD Semantic Scholar

Cite

Text

Carvalho et al. "GBRF: A Novel Framework for Encoding User-Preferences in Imbalanced Data Distributions via Genetic Optimization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06096-9_4

Markdown

[Carvalho et al. "GBRF: A Novel Framework for Encoding User-Preferences in Imbalanced Data Distributions via Genetic Optimization." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/carvalho2025ecmlpkdd-gbrf/) doi:10.1007/978-3-032-06096-9_4

BibTeX

@inproceedings{carvalho2025ecmlpkdd-gbrf,
  title     = {{GBRF: A Novel Framework for Encoding User-Preferences in Imbalanced Data Distributions via Genetic Optimization}},
  author    = {Carvalho, Miguel and Pinho, Armando J. and Brás, Susana},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {59-77},
  doi       = {10.1007/978-3-032-06096-9_4},
  url       = {https://mlanthology.org/ecmlpkdd/2025/carvalho2025ecmlpkdd-gbrf/}
}