Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation
Abstract
Vision-Language-Action (VLA) models represent a critical milestone toward embodied intelligence in robotic manipulation. To support their training, recent research has developed high-performance simulation engines for data synthesis. However, their effectiveness is still significantly limited by the simulation-to-reality (Sim2Real) gap, as policies trained on synthetic data often fail to generalize reliably to the real world. To address this challenge, we present Sim2Real-VLA, a generalist robot control model trained exclusively on synthetic data, yet capable of transferring seamlessly to real-world manipulation tasks. Sim2Real-VLA features a dual-system architecture: a high-level planner that infers object-centered chains-of-affordances, and a low-level actor that executes and validates these plans in real time via a tokenized action space. This design filters out manipulation-irrelevant features and prioritizes motion-critical dynamics, thereby enhancing Sim2Real domain transfer. Besides, a notable advantage of Sim2Real-VLA lies in its tight integration with automated data generation for manipulation skills, eliminating the need for manual fine-tuning and enabling scalable, hands-free training. Empirical evaluations across bimanual, dexterous, and long-horizon tasks show that Sim2Real-VLA consistently outperforms previous VLA baselines under diverse real-world environments and domain shifts.
Cite
Text
Zhao et al. "Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation." International Conference on Learning Representations, 2026.Markdown
[Zhao et al. "Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-sim2real/)BibTeX
@inproceedings{zhao2026iclr-sim2real,
title = {{Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation}},
author = {Zhao, Runyi and Xu, Sheng and Jin, Ruixing and Deng, Yueci and Tai, Yunxin and Jia, Kui and Liu, Guiliang},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhao2026iclr-sim2real/}
}