WorldScore: A Unified Evaluation Benchmark for World Generation
Abstract
We introduce the WorldScore benchmark, the first unified benchmark for world generation. We decompose world generation into a sequence of next-scene generation tasks with explicit camera trajectory-based layout specifications, enabling unified evaluation of diverse approaches from 3D and 4D scene generation to video generation models. The WorldScore benchmark encompasses a curated dataset of 3,000 test examples that span diverse worlds: static and dynamic, indoor and outdoor, photorealistic and stylized. The WorldScore metric evaluates generated worlds through three key aspects: controllability, quality, and dynamics. Through extensive evaluation of 20 representative models, including both open-source and closed-source ones, we reveal key insights and challenges for each category of models. Our dataset, evaluation code, and leaderboard can be found at https://haoyi-duan.github.io/WorldScore/.
Cite
Text
Duan et al. "WorldScore: A Unified Evaluation Benchmark for World Generation." International Conference on Computer Vision, 2025.Markdown
[Duan et al. "WorldScore: A Unified Evaluation Benchmark for World Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/duan2025iccv-worldscore/)BibTeX
@inproceedings{duan2025iccv-worldscore,
title = {{WorldScore: A Unified Evaluation Benchmark for World Generation}},
author = {Duan, Haoyi and Yu, Hong-Xing and Chen, Sirui and Fei-Fei, Li and Wu, Jiajun},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {27713-27724},
url = {https://mlanthology.org/iccv/2025/duan2025iccv-worldscore/}
}