Can Diffusion Models Generalize? Privacy and Fairness Trade-Offs for Medical Data Sharing.

Abstract

The recent surge in options for diffusion model-based synthetic data sharing offers significant benefits for medical research, provided privacy and fairness concerns are addressed. Generative models risk memorizing sensitive training samples, potentially exposing identifiable information. Simultaneously, underrepresented features -- such as rare diseases, uncommon medical devices, or infrequent patient ethnicities -- are often not learned well, creating unfair biases in downstream applications. Our work unifies these challenges by leveraging artificially generated fingerprints (SAFs) in the training data as a controllable test for memorization and fairness. Specifically, we measure whether a diffusion model reproduces these fingerprints verbatim (a privacy breach) or ignores them entirely (a fairness violation) and introduce an indicator t' to quantify finished models for the likelihood of reproducing training samples. Extensive experiments on real and synthetic medical imaging datasets reveal that na\"ive diffusion model training can lead to privacy leaks or unfair coverage. By systematically incorporating SAFs and monitoring t', we demonstrate how to balance privacy and fairness objectives. Our evaluation framework provides actionable guidance for designing generative models that preserve patient anonymity without excluding underrepresented patient subgroups. Code is available at https://github.com/MischaD/Privacy.

Cite

Text

Dombrowski and Kainz. "Can Diffusion Models Generalize? Privacy and Fairness Trade-Offs for Medical Data Sharing.." Medical Imaging with Deep Learning, 2025.

Markdown

[Dombrowski and Kainz. "Can Diffusion Models Generalize? Privacy and Fairness Trade-Offs for Medical Data Sharing.." Medical Imaging with Deep Learning, 2025.](https://mlanthology.org/midl/2025/dombrowski2025midl-diffusion/)

BibTeX

@inproceedings{dombrowski2025midl-diffusion,
  title     = {{Can Diffusion Models Generalize? Privacy and Fairness Trade-Offs for Medical Data Sharing.}},
  author    = {Dombrowski, Mischa and Kainz, Bernhard},
  booktitle = {Medical Imaging with Deep Learning},
  year      = {2025},
  url       = {https://mlanthology.org/midl/2025/dombrowski2025midl-diffusion/}
}