SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records Using Decoder-Only Transformers

Abstract

Generating synthetic Electronic Health Records (EHRs) offers significant potential for data augmentation, privacy-preserving data sharing, and enhancing machine learning model training. We propose a novel tokenization strategy tailored for structured EHR data, which encompasses diverse data types such as covariates, ICD codes, and irregularly sampled time series. Utilizing a GPT-like decoder-only transformer model, we demonstrate the generation of high-quality synthetic EHRs. Our approach is evaluated using the MIMIC-III dataset, and we benchmark the fidelity, utility, and privacy of the generated data against state-of-the-art models.

Cite

Text

Karami et al. "SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records Using Decoder-Only Transformers." NeurIPS 2024 Workshops: GenAI4Health, 2024.

Markdown

[Karami et al. "SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records Using Decoder-Only Transformers." NeurIPS 2024 Workshops: GenAI4Health, 2024.](https://mlanthology.org/neuripsw/2024/karami2024neuripsw-synehrgy/)

BibTeX

@inproceedings{karami2024neuripsw-synehrgy,
  title     = {{SynEHRgy: Synthesizing Mixed-Type Structured Electronic Health Records Using Decoder-Only Transformers}},
  author    = {Karami, Hojjat and Atienza, David and Paraschiv-Ionescu, Anisoara},
  booktitle = {NeurIPS 2024 Workshops: GenAI4Health},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/karami2024neuripsw-synehrgy/}
}