Vision-Language Generative Model for View-Specific Chest X-Ray Generation
Abstract
Synthetic medical data generation has opened up new possibilities in the healthcare domain, offering a powerful tool for simulating clinical scenarios, enhancing diagnostic and treatment quality, gaining granular medical knowledge, and accelerating the development of unbiased algorithms. In this context, we present a novel approach called ViewXGen, designed to overcome the limitations of existing methods that rely on general domain pipelines using only radiology reports to generate frontal-view chest X-rays. Our approach takes into consideration the diverse view positions found in the dataset, enabling the generation of chest X-rays with specific views, which marks a significant advancement in the field. To achieve this, we introduce a set of specially designed tokens for each view position, tailoring the generation process to the user’s preferences. Furthermore, we leverage multi-view chest X-rays as input, incorporating valuable information from different views within the same study. This integration rectifies potential errors and contributes to faithfully capturing abnormal findings in chest X-ray generation. To validate the effectiveness of our approach, we conducted statistical analyses, evaluating its performance in a clinical efficacy metric on the MIMIC-CXR dataset. Also, human evaluation demonstrates the remarkable capabilities of ViewXGen, particularly in producing realistic view-specific X-rays that closely resemble the original images.
Cite
Text
Lee et al. "Vision-Language Generative Model for View-Specific Chest X-Ray Generation." Proceedings of the fifth Conference on Health, Inference, and Learning, 2024.Markdown
[Lee et al. "Vision-Language Generative Model for View-Specific Chest X-Ray Generation." Proceedings of the fifth Conference on Health, Inference, and Learning, 2024.](https://mlanthology.org/chil/2024/lee2024chil-visionlanguage/)BibTeX
@inproceedings{lee2024chil-visionlanguage,
title = {{Vision-Language Generative Model for View-Specific Chest X-Ray Generation}},
author = {Lee, Hyungyung and Lee, Da Young and Kim, Wonjae and Kim, Jin-Hwa and Kim, Tackeun and Kim, Jihang and Sunwoo, Leonard and Choi, Edward},
booktitle = {Proceedings of the fifth Conference on Health, Inference, and Learning},
year = {2024},
pages = {280-296},
volume = {248},
url = {https://mlanthology.org/chil/2024/lee2024chil-visionlanguage/}
}