UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos

Abstract

Urban embodied AI agents, ranging from delivery robots to quadrupeds, are increasingly populating our cities, navigating chaotic streets to provide last-mile connectivity. Training such agents requires diverse, high-fidelity urban environments to scale, yet existing human-crafted or procedurally generated simulation scenes either lack scalability or fail to capture real-world complexity. We introduce UrbanVerse, a data-driven real-to-sim system that converts crowd-sourced city-tour videos into physics-aware, interactive simulation scenes. UrbanVerse consists of: (i) UrbanVerse-100K, a repository of 100k+ annotated urban 3D assets with semantic and physical attributes, and (ii) UrbanVerse-Gen, an automatic pipeline that extracts scene layouts from video and instantiates metric-scale 3D simulations using retrieved assets. Running in IsaacSim, UrbanVerse offers 160 high-quality constructed scenes from 24 countries, along with a curated benchmark of 10 artist-designed test scenes. Experiments show that UrbanVerse scenes preserve real-world semantics and layouts, achieving human-evaluated realism comparable to manually crafted scenes. In urban navigation, policies trained in UrbanVerse exhibit scaling power laws and strong generalization, improving success by +6.3% in simulation and +30.1% in zero-shot sim-to-real transfer comparing to prior methods, accomplishing a 300 m real-world mission with only two interventions.

Cite

Text

Liu et al. "UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-urbanverse/)

BibTeX

@inproceedings{liu2026iclr-urbanverse,
  title     = {{UrbanVerse: Scaling Urban Simulation by Watching City-Tour Videos}},
  author    = {Liu, Mingxuan and He, Honglin and Ricci, Elisa and Wu, Wayne and Zhou, Bolei},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-urbanverse/}
}