Diving into Self-Evolve Training for Multimodal Reasoning

Liu, Wei; Li, Junlong; Zhang, Xiwen; Zhou, Fan; Cheng, Yu; He, Junxian

Diving into Self-Evolve Training for Multimodal Reasoning

Wei Liu, Junlong Li, Xiwen Zhang, Fan Zhou, Yu Cheng, Junxian He

ICLRW 2025

/iclrw/2025/liu2025iclrw-diving/

Abstract

Self-evolving training—where models iteratively learn from their own outputs—has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: $\textit{Training Method}$, $\textit{Reward Model}$, and $\textit{Prompt Variation}$. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STaR (**M**ultimodal **S**elf-evolving **T**r**a**ining for **R**easoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources will be made publicly available.

PDF ICLRW OpenReview Semantic Scholar

Cite

Text

Liu et al. "Diving into Self-Evolve Training for Multimodal Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.

Markdown

[Liu et al. "Diving into Self-Evolve Training for Multimodal Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/liu2025iclrw-diving/)

BibTeX

@inproceedings{liu2025iclrw-diving,
  title     = {{Diving into Self-Evolve Training for Multimodal Reasoning}},
  author    = {Liu, Wei and Li, Junlong and Zhang, Xiwen and Zhou, Fan and Cheng, Yu and He, Junxian},
  booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/liu2025iclrw-diving/}
}