SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

Li, Haozhan; Zuo, Yuxin; Yu, Jiale; Zhang, Yuhao; Zhaohui, Yang; Zhang, Kaiyan; Zhu, Xuekai; Zhang, Yuchen; Chen, Tianxing; Cui, Ganqu; Wang, Dehui; Luo, Dingxiang; Fan, Yuchen; Sun, Youbang; Zeng, Jia; Pang, Jiangmiao; Zhang, Shanghang; Wang, Yu; Mu, Yao; Zhou, Bowen; Ding, Ning

SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

ICLR 2026

/iclr/2026/li2026iclr-simplevlarl/

Abstract

Vision-Language-Action (VLA) models have emerged as a powerful paradigm for robotic manipulation. Despite substantial progress enabled by large-scale pretraining and supervised fine-tuning (SFT), these models face two fundamental challenges: (i) the scarcity and high cost of large-scale robotic trajectories required for SFT scaling, and (ii) limited generalization to tasks under distribution shift. To overcome these limitations, we explore reinforcement learning (RL) as a pathway to scaling VLA training beyond limited datasets. Inspired by LLM breakthroughs where RL with outcome rewards enhances step-by-step reasoning, we ask: Can outcome-driven RL improve long-horizon step-by-step action planning of VLA? In this work, we introduce SimpleVLA-RL, an efficient RL framework tailored for VLA models. Building upon veRL, we introduce VLA-specific trajectory sampling, scalable parallelization, multi-environment rendering, and optimized loss computation. Applied to OpenVLA-OFT, SimpleVLA-RL achieves 99\% of SoTA performance on LIBERO and 80\% relative improvement on RoboTwin 1.0\&2.0, outperforming $\pi_0$ with our proposed exploration-enhancing strategies. SimpleVLA-RL reduces dependence on large-scale data, enables robust generalization, and remarkably surpasses SFT in real-world tasks. Moreover, we identify a novel phenomenon "pushcut'' during RL training, wherein the policy discovers unseen patterns beyond those seen in previous training process.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-simplevlarl/)

BibTeX

@inproceedings{li2026iclr-simplevlarl,
  title     = {{SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning}},
  author    = {Li, Haozhan and Zuo, Yuxin and Yu, Jiale and Zhang, Yuhao and Zhaohui, Yang and Zhang, Kaiyan and Zhu, Xuekai and Zhang, Yuchen and Chen, Tianxing and Cui, Ganqu and Wang, Dehui and Luo, Dingxiang and Fan, Yuchen and Sun, Youbang and Zeng, Jia and Pang, Jiangmiao and Zhang, Shanghang and Wang, Yu and Mu, Yao and Zhou, Bowen and Ding, Ning},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-simplevlarl/}
}