Filtering Instances and Rejecting Predictions to Obtain Reliable Models in Healthcare

Abstract

Machine Learning (ML) models are widely used in high-stakes domains such as healthcare, where the reliability of predictions is critical. However, these models often fail to account for uncertainty, providing predictions even with low confidence. This work proposes a novel two-step data-centric approach to enhance the performance of ML models by improving data quality and filtering low-confidence predictions. The first step involves leveraging Instance Hardness (IH) to filter problematic instances during training, thereby refining the dataset. The second step introduces a confidence-based rejection mechanism during inference, ensuring that only reliable predictions are retained. We evaluate our approach using three real-world healthcare datasets, demonstrating its effectiveness at improving model reliability while balancing predictive performance and rejection rate. Additionally, we use alternative criteria−influence values for filtering and uncertainty for rejection−as baselines to evaluate the efficiency of the proposed method. The results demonstrate that integrating IH filtering with confidence-based rejection effectively enhances model performance while preserving a large proportion of instances. This approach provides a practical method for deploying ML systems in safety-critical applications.

Cite

Text

Valeriano et al. "Filtering Instances and Rejecting Predictions to Obtain Reliable Models in Healthcare." Machine Learning, 2026. doi:10.1007/S10994-025-06941-8

Markdown

[Valeriano et al. "Filtering Instances and Rejecting Predictions to Obtain Reliable Models in Healthcare." Machine Learning, 2026.](https://mlanthology.org/mlj/2026/valeriano2026mlj-filtering/) doi:10.1007/S10994-025-06941-8

BibTeX

@article{valeriano2026mlj-filtering,
  title     = {{Filtering Instances and Rejecting Predictions to Obtain Reliable Models in Healthcare}},
  author    = {Valeriano, Maria Gabriela and Marzagão, David Kohan and Montelongo, Alfredo and Kiffer, Carlos Roberto Veiga and Katz, Natan and Lorena, Ana Carolina},
  journal   = {Machine Learning},
  year      = {2026},
  pages     = {15},
  doi       = {10.1007/S10994-025-06941-8},
  volume    = {115},
  url       = {https://mlanthology.org/mlj/2026/valeriano2026mlj-filtering/}
}