Diffusion-Based Visual Representation Learning for Medical Question Answering
Abstract
Medical visual question answering (Med-VQA) aims to correctly answer the medical question based on the given image. One of the major challenges is the scarcity of large professional labeled datasets for training, which poses obstacles to feature extraction, especially for medical images. To overcome it, we propose a method to learn transferable visual representation based on conditional denoising diffusion probabilistic model(conditional DDPM).Specifically, we collate a large amount of unlabeled radiological images and train a conditional DDPM with the paradigm of auto-encoder to obtain a model which can extract high-level semantic information from medical images.The pre-trained model can be used as a well initialized visual feature extractor and can be easily adapt to any Med-VQA systems. We build our Med-VQA system follow the state-of-the-art Med-VQA architecture and replace the visual extractor with our pre-trained model.Our proposal method outperforms the state-of-the-art Med-VQA method on VQA-RAD and achieves comparable result on SLAKE.
Cite
Text
Bian et al. "Diffusion-Based Visual Representation Learning for Medical Question Answering." Proceedings of the 15th Asian Conference on Machine Learning, 2023.Markdown
[Bian et al. "Diffusion-Based Visual Representation Learning for Medical Question Answering." Proceedings of the 15th Asian Conference on Machine Learning, 2023.](https://mlanthology.org/acml/2023/bian2023acml-diffusionbased/)BibTeX
@inproceedings{bian2023acml-diffusionbased,
title = {{Diffusion-Based Visual Representation Learning for Medical Question Answering}},
author = {Bian, Dexin and Wang, Xiaoru and Li, Meifang},
booktitle = {Proceedings of the 15th Asian Conference on Machine Learning},
year = {2023},
pages = {169-184},
volume = {222},
url = {https://mlanthology.org/acml/2023/bian2023acml-diffusionbased/}
}