WorldScore: A Unified Evaluation Benchmark for World Generation

Abstract

We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metric evaluates generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 20 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found at https://haoyi-duan.github.io/WorldScore/.

Cite

Text

Duan et al. "WorldScore: A Unified Evaluation Benchmark for World Generation." International Conference on Computer Vision, 2025.

Markdown

[Duan et al. "WorldScore: A Unified Evaluation Benchmark for World Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/duan2025iccv-worldscore/)

BibTeX

@inproceedings{duan2025iccv-worldscore,
  title     = {{WorldScore: A Unified Evaluation Benchmark for World Generation}},
  author    = {Duan, Haoyi and Yu, Hong-Xing and Chen, Sirui and Fei-Fei, Li and Wu, Jiajun},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {27713-27724},
  url       = {https://mlanthology.org/iccv/2025/duan2025iccv-worldscore/}
}