GRS: Generating Robotic Simulation Tasks from Real-World Images
Abstract
We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solvable tasks for virtual agent training. Using vision-language models (VLMs), our pipeline operates in three stages: 1) scene comprehension with SAM2 for segmentation and object description, 2) matching objects with simulation-ready assets, and 3) generating appropriate tasks. We ensure simulation-task alignment through generated test suites and introduce a router that iteratively refines both simulation and test code. Experiments demonstrate our system's effectiveness in object correspondence and task environment generation through our novel router mechanism.
Cite
Text
Zook et al. "GRS: Generating Robotic Simulation Tasks from Real-World Images." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.Markdown
[Zook et al. "GRS: Generating Robotic Simulation Tasks from Real-World Images." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2025.](https://mlanthology.org/cvprw/2025/zook2025cvprw-grs/)BibTeX
@inproceedings{zook2025cvprw-grs,
title = {{GRS: Generating Robotic Simulation Tasks from Real-World Images}},
author = {Zook, Alex and Sun, Fan-Yun and Spjut, Josef B. and Blukis, Valts and Birchfield, Stan and Tremblay, Jonathan},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2025},
pages = {594-603},
url = {https://mlanthology.org/cvprw/2025/zook2025cvprw-grs/}
}