SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents
Abstract
Self-evolution, the ability of agents to autonomously improve their reasoning and behavior, is essential for the embodied domain with long-horizon, real-world tasks. Despite current advancements in reinforcement fine-tuning (RFT) showing strong performance in enhancing reasoning in LLMs, its potential to enable self-evolving embodied intelligence with multi-modal interactions remains largely unexplored. Specifically, reinforcement fine-tuning faces two fundamental obstacles in embodied settings: (i) the lack of accessible intermediate rewards in multi-step reasoning tasks limits effective learning signals, and (ii) reliance on hand-crafted reward functions restricts generalization to novel tasks and environments. To address these challenges, we present *Self-Evolving Embodied Agents-R1*, **SEEA-R1**, the first RFT framework designed for enabling the self-evolving capabilities of embodied agents. Specifically, to convert sparse delayed rewards into denser intermediate signals that improve multi-step reasoning, we propose Tree-based group relative policy optimization (**Tree-GRPO**) integrates Monte Carlo Tree Search into GRPO. To generalize reward estimation across tasks and scenes, supporting autonomous adaptation and reward-driven self-evolution, we further introduce Multi-modal Generative Reward Model (**MGRM**). To holistically evaluate the effectiveness of SEEA-R1, we evaluate on the ALFWorld benchmark, surpassing state-of-the-art methods with scores of 85.07\% (textual) and 46.27\% (multi-modal), outperforming prior models including GPT-4o. SEEA-R1 also achieves scores of 80.3\% (textual) and 44.03\% (multi-modal) without ground truth reward, surpassing all open-source baselines and highlighting its scalability as a self-evolving embodied agent. Additional experiments and qualitative analysis further support the potential of SEEA-R1 for future research in scalable embodied intelligence. Project page is at https://seea-r1.github.io/.
Cite
Text
Tian et al. "SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents." Advances in Neural Information Processing Systems, 2025.Markdown
[Tian et al. "SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/tian2025neurips-seear1/)BibTeX
@inproceedings{tian2025neurips-seear1,
title = {{SEEA-R1: Tree-Structured Reinforcement Fine-Tuning for Self-Evolving Embodied Agents}},
author = {Tian, Wanxin and Zhang, Shijie and Zhang, Kevin and Chi, Xiaowei and Fan, Chun-Kai and Lu, Junyu and Luo, Yulin and Zhou, Qiang and Zhao, Yiming and Liu, Ning and Lin, Siyu and Qin, Zhiyuan and Ju, Xiaozhu and Zhang, Shanghang and Tang, Jian},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/tian2025neurips-seear1/}
}