Advancing Medical Multimodal Learning and Data Generation with Diffusion Model and LLM
Abstract
Synthesizing electronic health records (EHR) is essential for addressing data scarcity, bias, and fairness in healthcare models. EHR data are inherently multimodal and sequential, encompassing structured codes, clinical notes, medical images, and irregular time intervals. Traditional generative models like GANs and VAEs struggle to capture these complexities, while diffusion-based models offer improvements but remain limited to task-specific applications. To address these challenges, two diffusion-based models, MedDiffusion and EHRPD, have been developed. MedDiffusion enhances health risk prediction by generating synthetic patient data and capturing visit-level relationships, while EHRPD generates sequential, multimodal EHR data, incorporating temporal interval estimation to improve diversity and fidelity. Future work aims to overcome limitations in multimodal data generation by developing a generalized model capable of handling diverse modalities simultaneously, expanding the applicability of EHR data generation across healthcare tasks.
Cite
Text
Zhong. "Advancing Medical Multimodal Learning and Data Generation with Diffusion Model and LLM." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I28.35237Markdown
[Zhong. "Advancing Medical Multimodal Learning and Data Generation with Diffusion Model and LLM." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhong2025aaai-advancing/) doi:10.1609/AAAI.V39I28.35237BibTeX
@inproceedings{zhong2025aaai-advancing,
title = {{Advancing Medical Multimodal Learning and Data Generation with Diffusion Model and LLM}},
author = {Zhong, Yuan},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {29319-29320},
doi = {10.1609/AAAI.V39I28.35237},
url = {https://mlanthology.org/aaai/2025/zhong2025aaai-advancing/}
}