Multi-Modal Medical Diagnosis via Large-Small Model Collaboration

Abstract

Recent advances in medical AI have shown a clear trend towards large models in healthcare. However, developing large models for multi-modal medical diagnosis remains challenging due to a lack of sufficient modal-complete medical data. Most existing multi-modal diagnostic models are relatively small and struggle with limited feature extraction capabilities. To bridge this gap, we propose **AdaCoMed**, an **ada**ptive **co**llaborative-learning framework that synergistically integrates the off-the-shelf **med**ical single-modal large models with multi-modal small models. Our framework first employs a mixture-of-modality-experts (MoME) architecture to combine features extracted from multiple single-modal medical large models, and then introduces a novel adaptive co-learning mechanism to collaborate with a multi-modal small model. This co-learning mechanism, guided by an adaptive weighting strategy, dynamically balances the complementary strengths between the MoME-fused large model features and the cross-modal reasoning capabilities of the small model. Extensive experiments on two representative multi-modal medical datasets (MIMIC-IV-MM and MMIST ccRCC) across six modalities and four diagnostic tasks demonstrate consistent improvements over state-of-the-art baselines, making it a promising solution for real-world medical diagnosis applications.

Cite

Text

Chen et al. "Multi-Modal Medical Diagnosis via Large-Small Model Collaboration." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.02865

Markdown

[Chen et al. "Multi-Modal Medical Diagnosis via Large-Small Model Collaboration." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/chen2025cvpr-multimodal/) doi:10.1109/CVPR52734.2025.02865

BibTeX

@inproceedings{chen2025cvpr-multimodal,
  title     = {{Multi-Modal Medical Diagnosis via Large-Small Model Collaboration}},
  author    = {Chen, Wanyi and Zhao, Zihua and Yao, Jiangchao and Zhang, Ya and Bu, Jiajun and Wang, Haishuai},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {30763-30773},
  doi       = {10.1109/CVPR52734.2025.02865},
  url       = {https://mlanthology.org/cvpr/2025/chen2025cvpr-multimodal/}
}