ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models

Abstract

State-of-the-art medical multi-modal LLMs (med-MLLMs), such as LLaVA-Med and BioMedGPT, primarily depend on scaling model size and data volume, with training driven largely by autoregressive objectives. However, we reveal that this approach can lead to weak vision-language alignment, making these models overly dependent on costly instruction-following data. To address this, we introduce ExGra-Med, a novel multi-graph alignment framework that jointly aligns images, instruction responses, and extended captions in the latent space, advancing semantic grounding and cross-modal coherence. To scale to large LLMs (e.g., LLaMa-7B), we develop an efficient end-to-end training scheme using black-box gradient estimation, enabling fast and scalable optimization. Empirically, ExGra-Med matches LLaVA-Med’s performance using just 10\% of pre-training data, achieving a 20.13\% gain on VQA-RAD and approaching full-data performance. It also outperforms strong baselines like BioMedGPT and RadFM on visual chatbot and zero-shot classification tasks, demonstrating its promise for efficient, high-quality vision-language integration in medical AI.

Cite

Text

Nguyen et al. "ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Nguyen et al. "ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/nguyen2025neurips-exgramed/)

BibTeX

@inproceedings{nguyen2025neurips-exgramed,
  title     = {{ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models}},
  author    = {Nguyen, Duy Minh Ho and Diep, Nghiem Tuong and Nguyen, Trung Quoc and Le, Hoang-Bao and Nguyen, Tai and Nguyen, Anh-Tien and Nguyen, TrungTin and Ho, Nhat and Xie, Pengtao and Wattenhofer, Roger and Sonntag, Daniel and Zou, James and Niepert, Mathias},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/nguyen2025neurips-exgramed/}
}