Synthetic Health-Related Longitudinal Data with Mixed-Type Variables Generated Using Diffusion Models
Abstract
This paper introduces a novel method for simulating Electronic Health Records (EHRs) using Diffusion Probabilistic Models (DPMs). We showcase the ability of DPMs to generate longitudinal EHRs with mixed-type variables – numeric, binary, and categorical. Our approach is benchmarked against existing Generative Adversarial Network (GAN)-based methods in two clinical scenarios: management of acute hypotension in the intensive care unit and antiretroviral therapy for people with human immunodeficiency virus. Our DPM-simulated datasets not only minimise patient disclosure risk but also outperform GAN-generated datasets in terms of realism. These datasets also prove effective for training downstream machine learning algorithms, including reinforcement learning and Cox proportional hazards models for survival analysis.
Cite
Text
Kuo et al. "Synthetic Health-Related Longitudinal Data with Mixed-Type Variables Generated Using Diffusion Models." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.Markdown
[Kuo et al. "Synthetic Health-Related Longitudinal Data with Mixed-Type Variables Generated Using Diffusion Models." NeurIPS 2023 Workshops: SyntheticData4ML, 2023.](https://mlanthology.org/neuripsw/2023/kuo2023neuripsw-synthetic/)BibTeX
@inproceedings{kuo2023neuripsw-synthetic,
title = {{Synthetic Health-Related Longitudinal Data with Mixed-Type Variables Generated Using Diffusion Models}},
author = {Kuo, Nicholas I-Hsien and Garcia, Federico and Sonnerborg, Anders and Bohm, Michael and Kaiser, Rolf and Zazzi, Maurizio and Jorm, Louisa and Barbieri, Sebastiano},
booktitle = {NeurIPS 2023 Workshops: SyntheticData4ML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/kuo2023neuripsw-synthetic/}
}