Federated Learning Towards the Unknown: A Deep Dive into Diabetic Retinopathy Prediction from Real-World EHR Structured Data on Unseen Diabetic Centers

Abstract

The compatibility of Federated Learning (FL) models with unseen Out-Of-Federation (OOF) centers remains a critical yet underexplored challenge, particularly when dealing with heterogeneous data. To address this gap, this study proposes a data-driven approach to assess the feasibility of applying an FL model to OOF centers. The case study explored is the prediction of diabetic retinopathy from multiple real-world, highly heterogeneous electronic health records. An FL XGBoost model (FL-XGB) is trained across five in-federation (IF) centers, showing an average test Area Under the ROC Curve ( AUC ) of 75.27%. A novel metric, the OOF Applicability (OFA) predictor, is introduced to estimate whether FL-XGB could be safely applied to the 15 OOF centers. OFA combines statistical and learnable features from both IF and OOF centers and is used as a predictor for a regression model, employed to estimate the performance of FL-XGB (in terms of AUC ) on OOF datasets. The regression model achieved a confidence of 76% in predicting AUC values, with a statistically significant p-value ( $\ll $ ≪ 0.001). The average discrepancy between the predicted and observed AUC values was 6%. Overall, FL-XGB shows robust performance on IF centers and the OFA predictor plays a crucial role in assessing its applicability to infer on unseen OOF centers. By providing statistically significant estimations, OFA effectively identifies OOF centers whose characteristics are too divergent from what the FL model can effectively manage. Our codes are available at https://github.com/geronimaw/OFA4FL .

Cite

Text

Cacciatore et al. "Federated Learning Towards the Unknown: A Deep Dive into Diabetic Retinopathy Prediction from Real-World EHR Structured Data on Unseen Diabetic Centers." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06118-8_20

Markdown

[Cacciatore et al. "Federated Learning Towards the Unknown: A Deep Dive into Diabetic Retinopathy Prediction from Real-World EHR Structured Data on Unseen Diabetic Centers." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/cacciatore2025ecmlpkdd-federated/) doi:10.1007/978-3-032-06118-8_20

BibTeX

@inproceedings{cacciatore2025ecmlpkdd-federated,
  title     = {{Federated Learning Towards the Unknown: A Deep Dive into Diabetic Retinopathy Prediction from Real-World EHR Structured Data on Unseen Diabetic Centers}},
  author    = {Cacciatore, Alessandro and Di Cosmo, Mariachiara and Frontoni, Emanuele and Bernardini, Michele},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {338-355},
  doi       = {10.1007/978-3-032-06118-8_20},
  url       = {https://mlanthology.org/ecmlpkdd/2025/cacciatore2025ecmlpkdd-federated/}
}