Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation

Abstract

Vision-Language-Action (VLA) models represent a critical milestone toward embodied intelligence in robotic manipulation. To support their training, recent research has developed high-performance simulation engines for data synthesis. However, their effectiveness is still significantly limited by the simulation-to-reality (Sim2Real) gap, as policies trained on synthetic data often fail to generalize reliably to the real world. To address this challenge, we present Sim2Real-VLA, a generalist robot control model trained exclusively on synthetic data, yet capable of transferring seamlessly to real-world manipulation tasks. Sim2Real-VLA features a dual-system architecture: a high-level planner that infers object-centered chains-of-affordances, and a low-level actor that executes and validates these plans in real time via a tokenized action space. This design filters out manipulation-irrelevant features and prioritizes motion-critical dynamics, thereby enhancing Sim2Real domain transfer. Besides, a notable advantage of Sim2Real-VLA lies in its tight integration with automated data generation for manipulation skills, eliminating the need for manual fine-tuning and enabling scalable, hands-free training. Empirical evaluations across bimanual, dexterous, and long-horizon tasks show that Sim2Real-VLA consistently outperforms previous VLA baselines under diverse real-world environments and domain shifts.

Cite

Text

Zhao et al. "Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation." International Conference on Learning Representations, 2026.

Markdown

[Zhao et al. "Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhao2026iclr-sim2real/)

BibTeX

@inproceedings{zhao2026iclr-sim2real,
  title     = {{Sim2Real VLA: Zero-Shot Generalization of Synthesized Skills to Realistic Manipulation}},
  author    = {Zhao, Runyi and Xu, Sheng and Jin, Ruixing and Deng, Yueci and Tai, Yunxin and Jia, Kui and Liu, Guiliang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhao2026iclr-sim2real/}
}