RAD: Towards Trustworthy Retrieval-Augmented Multi-Modal Clinical Diagnosis

Abstract

Clinical diagnosis is a highly specialized discipline requiring both domain expertise and strict adherence to rigorous guidelines. While current AI-driven medical research predominantly focuses on knowledge graphs or natural text pretraining paradigms to incorporate medical knowledge, these approaches primarily rely on implicitly encoded knowledge within model parameters, neglecting task-specific knowledge required by diverse downstream tasks. To address this limitation, we propose **R**etrieval-**A**ugmented **D**iagnosis (RAD), a novel framework that explicitly injects external knowledge into multimodal models directly on downstream tasks. Specifically, RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss that constrains the latent distance between multi-modal features and guideline knowledge, and the dual transformer decoder that employs guidelines as queries to steer cross-modal fusion, aligning the models with clinical diagnostic workflows from guideline acquisition to feature extraction and decision-making. Moreover, recognizing the lack of quantitative evaluation of interpretability for multimodal diagnostic models, we introduce a set of criteria to assess the interpretability from both image and text perspectives. Extensive evaluations across four datasets with different anatomies demonstrate RAD's generalizability, achieving state-of-the-art performance. Furthermore, RAD enables the model to concentrate more precisely on abnormal regions and critical indicators, ensuring evidence-based, trustworthy diagnosis. Our code is available at https://github.com/tdlhl/RAD.

Cite

Text

Li et al. "RAD: Towards Trustworthy Retrieval-Augmented Multi-Modal Clinical Diagnosis." Advances in Neural Information Processing Systems, 2025.

Markdown

[Li et al. "RAD: Towards Trustworthy Retrieval-Augmented Multi-Modal Clinical Diagnosis." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/li2025neurips-rad/)

BibTeX

@inproceedings{li2025neurips-rad,
  title     = {{RAD: Towards Trustworthy Retrieval-Augmented Multi-Modal Clinical Diagnosis}},
  author    = {Li, Haolin and Dai, Tianjie and Chen, Zhe and Du, Siyuan and Yao, Jiangchao and Zhang, Ya and Wang, Yanfeng},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/li2025neurips-rad/}
}