Diving into Self-Evolve Training for Multimodal Reasoning

Abstract

Self-evolving training—where models iteratively learn from their own outputs—has emerged as a key approach for complex reasoning tasks, addressing the scarcity of high-quality chain-of-thought data. However, its effectiveness in multimodal reasoning, a domain more intricate than text-only reasoning, remains underexplored, and the understanding of critical factors in this training paradigm remains limited. Furthermore, a central challenge for this training method is performance saturation, which impedes further improvements and scalability. Inspired by reinforcement learning (RL), in this paper, we reframe self-evolving training for multimodal reasoning through the lens of RL, identifying three pivotal factors: $\textit{Training Method}$, $\textit{Reward Model}$, and $\textit{Prompt Variation}$. Through systematic analysis, we establish relatively optimal design principles that significantly enhance multimodal reasoning capabilities. Moreover, delving deeper into training dynamics, we uncover the roots of saturation and propose a new automatic balancing mechanism to mitigate this limitation. Building on these insights, we propose M-STaR (**M**ultimodal **S**elf-evolving **T**r**a**ining for **R**easoning), a framework that achieves consistent performance gains across models of varying sizes and diverse benchmarks. All resources will be made publicly available.

Cite

Text

Liu et al. "Diving into Self-Evolve Training for Multimodal Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.

Markdown

[Liu et al. "Diving into Self-Evolve Training for Multimodal Reasoning." ICLR 2025 Workshops: LLM_Reason_and_Plan, 2025.](https://mlanthology.org/iclrw/2025/liu2025iclrw-diving/)

BibTeX

@inproceedings{liu2025iclrw-diving,
  title     = {{Diving into Self-Evolve Training for Multimodal Reasoning}},
  author    = {Liu, Wei and Li, Junlong and Zhang, Xiwen and Zhou, Fan and Cheng, Yu and He, Junxian},
  booktitle = {ICLR 2025 Workshops: LLM_Reason_and_Plan},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/liu2025iclrw-diving/}
}