X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real

Abstract

Human videos offer a scalable way to train robot manipulation policies, but lack the action labels needed by standard imitation learning algorithms. Existing cross-embodiment approaches try to map human motion to robot actions, but often fail when the embodiments differ significantly. We propose X-Sim, a real-to-sim-to-real framework that uses object motion as a dense and transferable signal for learning robot policies. X-Sim starts by reconstructing a photorealistic simulation from an RGBD human video and tracking object trajectories to define object-centric rewards. These rewards are used to train a reinforcement learning (RL) policy in simulation. The learned policy is then distilled into an image-conditioned diffusion policy using synthetic rollouts rendered with varied viewpoints and lighting. To transfer to the real world, X-Sim introduces an online domain adaptation technique that aligns real and simulated observations during deployment. Importantly, X-Sim does not require any robot teleoperation data. We evaluate it across 5 manipulation tasks in 2 environments and show that it: (1) improves task progress by 30% on average over hand-tracking and sim-to-real baselines, (2) matches behavior cloning with 10x less data collection, and (3) generalizes to new camera viewpoints and test-time changes.

Cite

Text

Dan et al. "X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real." Proceedings of The 9th Conference on Robot Learning, 2025.

Markdown

[Dan et al. "X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real." Proceedings of The 9th Conference on Robot Learning, 2025.](https://mlanthology.org/corl/2025/dan2025corl-xsim/)

BibTeX

@inproceedings{dan2025corl-xsim,
  title     = {{X-Sim: Cross-Embodiment Learning via Real-to-Sim-to-Real}},
  author    = {Dan, Prithwish and Kedia, Kushal and Chao, Angela and Duan, Edward and Pace, Maximus Adrian and Ma, Wei-Chiu and Choudhury, Sanjiban},
  booktitle = {Proceedings of The 9th Conference on Robot Learning},
  year      = {2025},
  pages     = {816-833},
  volume    = {305},
  url       = {https://mlanthology.org/corl/2025/dan2025corl-xsim/}
}