MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding

Alkhaldi, Asma; Alnajim, Raneem; Alabdullatef, Layan; Alyahya, Rawan; Chen, Jun; Zhu, Deyao; Alsinan, Ahmed Z.; Elhoseiny, Mohamed

MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding

Asma Alkhaldi, Raneem Alnajim, Layan Alabdullatef, Rawan Alyahya, Jun Chen, Deyao Zhu, Ahmed Z. Alsinan, Mohamed Elhoseiny

TMLR 2026

/tmlr/2026/alkhaldi2026tmlr-minigptmed/

Abstract

Recent advances in artificial intelligence (AI) have precipitated significant breakthroughs in healthcare, particularly in the refinement of diagnostic procedures. However, existing studies have been limited in terms of functional coverage. This study introduces MiniGPT-Med, a vision-language model adapted from MiniGPT-v2 for medical applications through domain-specific fine-tuning on medical datasets. MiniGPT-Med demonstrates remarkable versatility across various imaging modalities, including X-rays, CT scans, and MRIs, enhancing its utility. The model is capable of performing tasks such as medical report generation, visual question answering (VQA), and disease identification within medical imagery. Its integrated processing of both image and textual clinical data markedly improves diagnostic accuracy. Our empirical assessments confirm the superior performance of MiniGPT-Med in disease detection, medical report generation, and VQA benchmarks, representing a significant step towards reducing the gap in assisting radiology practice. Furthermore, it achieves state-of-the-art performance in medical report generation, with substantial gains in BERT-Sim over both specialist and generalist baselines, improving by 17 and 12 points, respectively. MiniGPT-Med promises to become a unified Vision-Language model for radiology diagnoses, enhancing diagnostic efficiency across a wide range of medical imaging applications.

PDF TMLR OpenReview Code Semantic Scholar

Cite

Text

Alkhaldi et al. "MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding." Transactions on Machine Learning Research, 2026.

Markdown

[Alkhaldi et al. "MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/alkhaldi2026tmlr-minigptmed/)

BibTeX

@article{alkhaldi2026tmlr-minigptmed,
  title     = {{MiniGPT-Med: A Unified Vision-Language Model for Radiology Image Understanding}},
  author    = {Alkhaldi, Asma and Alnajim, Raneem and Alabdullatef, Layan and Alyahya, Rawan and Chen, Jun and Zhu, Deyao and Alsinan, Ahmed Z. and Elhoseiny, Mohamed},
  journal   = {Transactions on Machine Learning Research},
  year      = {2026},
  url       = {https://mlanthology.org/tmlr/2026/alkhaldi2026tmlr-minigptmed/}
}