RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering

Butsanets, Léo; Corbière, Charles; Khlaut, Julien; Manceron, Pierre; Dancette, Corentin

RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering

Léo Butsanets, Charles Corbière, Julien Khlaut, Pierre Manceron, Corentin Dancette

MIDL 2026 pp. 3036-3068

/midl/2026/butsanets2026midl-radimagenetvqa/

Abstract

In this work, we introduce RadImageNet-VQA, a large-scale dataset designed to advance radiologic visual question answering (VQA) on CT and MRI exams. While existing medical VQA datasets are limited in scale, dominated by X-ray imaging or biomedical illustrations, and prone to text-based shortcuts, RadImageNet-VQA is built from expert-curated annotations and provides 750K images paired with 7.5M QA samples. It covers three key tasks—abnormality detection, anatomy recognition, and pathology identification—spanning 8 anatomical regions and 97 pathology categories, and supports open-ended, closed-ended, and multiple-choice questions. Extensive experiments show that state-of-the-art vision-language models still struggle with fine-grained pathology identification, especially in open-ended settings and even after fine-tuning. Text-only analysis further reveals that model accuracies collapse to near-random without image inputs, confirming that RadImageNet-VQA is free from linguistic shortcuts.

PDF MIDL Semantic Scholar

Cite

Text

Butsanets et al. "RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.

Markdown

[Butsanets et al. "RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering." Proceedings of The 9th International Conference on Medical Imaging with Deep Learning, 2026.](https://mlanthology.org/midl/2026/butsanets2026midl-radimagenetvqa/)

BibTeX

@inproceedings{butsanets2026midl-radimagenetvqa,
  title     = {{RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering}},
  author    = {Butsanets, Léo and Corbière, Charles and Khlaut, Julien and Manceron, Pierre and Dancette, Corentin},
  booktitle = {Proceedings of The 9th International Conference on Medical Imaging with Deep Learning},
  year      = {2026},
  pages     = {3036-3068},
  volume    = {315},
  url       = {https://mlanthology.org/midl/2026/butsanets2026midl-radimagenetvqa/}
}