MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning
Abstract
Medical Large Vision-Language Models (Med-LVLMs) have shown strong potential in multimodal diagnostic tasks. However, existing single-agent models struggle to generalize across diverse medical specialties, limiting their performance. Recent efforts introduce multi-agent collaboration frameworks inspired by clinical workflows, where general practitioners (GPs) and specialists interact in a fixed sequence. Despite improvements, these static pipelines lack flexibility and adaptability in reasoning. To address this, we propose MMedAgent-RL, a reinforcement learning (RL)-based multi-agent framework that enables dynamic, optimized collaboration among medical agents. Specifically, we train two GP agents based on Qwen2.5-VL via RL: the triage doctor learns to assign patients to appropriate specialties, while the attending physician integrates the judgments from multi-specialists and its own knowledge to make final decisions. To address the inconsistency in specialist outputs, we introduce a curriculum learning (CL)-guided RL strategy with dynamic entropy regulation, progressively teaching the attending physician to balance between imitating specialists and correcting their mistakes. Experiments on five medical VQA benchmarks demonstrate that MMedAgent-RL outperforms both open-source and proprietary Med-LVLMs. Notably, it achieves an average performance gain of 23.6\% over strong baselines.
Cite
Text
Xia et al. "MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning." International Conference on Learning Representations, 2026.Markdown
[Xia et al. "MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/xia2026iclr-mmedagentrl/)BibTeX
@inproceedings{xia2026iclr-mmedagentrl,
title = {{MMedAgent-RL: Optimizing Multi-Agent Collaboration for Multimodal Medical Reasoning}},
author = {Xia, Peng and Wang, Jinglu and Peng, Yibo and Zeng, Kaide and Dong, Zihan and Wu, Xian and Tang, Xiangru and Zhu, Hongtu and Li, Yun and Zhang, Linjun and Liu, Shujie and Lu, Yan and Yao, Huaxiu},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/xia2026iclr-mmedagentrl/}
}