R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO
Abstract
In this work, we aim to incentivize the reasoning ability of Multimodal Large Language Models (MLLMs) via reinforcement learning (RL) and develop an effective approach that mitigates the sparse reward and advantage vanishing issues during RL. To this end, we propose Share-GRPO, a novel RL approach that tackle these issues by exploring and sharing diverse reasoning trajectories over expanded question space. Specifically, Share-GRPO first expands the question space for a given question via data transformation techniques, and then encourages MLLM to effectively explore diverse reasoning trajectories over the expanded question space and shares the discovered reasoning trajectories across the expanded questions during RL. In addition, Share-GRPO also shares reward information during advantage computation, which estimates solution advantages hierarchically across and within question variants, allowing more accurate estimation of relative advantages and improving the stability of policy training. Extensive evaluations over 6 widely-used reasoning benchmarks showcase the superior performance of our method. Code is available at https://github.com/HJYao00/R1-ShareVL.
Cite
Text
Yao et al. "R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO." Advances in Neural Information Processing Systems, 2025.Markdown
[Yao et al. "R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/yao2025neurips-r1sharevl/)BibTeX
@inproceedings{yao2025neurips-r1sharevl,
title = {{R1-ShareVL: Incentivizing Reasoning Capabilities of Multimodal Large Language Models via Share-GRPO}},
author = {Yao, Huanjin and Yin, Qixiang and Zhang, Jingyi and Yang, Min and Wang, Yibo and Wu, Wenhao and Su, Fei and Shen, Li and Qiu, Minghui and Tao, Dacheng and Huang, Jiaxing},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/yao2025neurips-r1sharevl/}
}