METASCENES: Towards Automated Replica Creation for Real-World 3D Scans

Abstract

Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScenes's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research.

Cite

Text

Yu et al. "METASCENES: Towards Automated Replica Creation for Real-World 3D Scans." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00163

Markdown

[Yu et al. "METASCENES: Towards Automated Replica Creation for Real-World 3D Scans." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/yu2025cvpr-metascenes/) doi:10.1109/CVPR52734.2025.00163

BibTeX

@inproceedings{yu2025cvpr-metascenes,
  title     = {{METASCENES: Towards Automated Replica Creation for Real-World 3D Scans}},
  author    = {Yu, Huangyue and Jia, Baoxiong and Chen, Yixin and Yang, Yandan and Li, Puhao and Su, Rongpeng and Li, Jiaxin and Li, Qing and Liang, Wei and Zhu, Song-Chun and Liu, Tengyu and Huang, Siyuan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2025},
  pages     = {1667-1679},
  doi       = {10.1109/CVPR52734.2025.00163},
  url       = {https://mlanthology.org/cvpr/2025/yu2025cvpr-metascenes/}
}