WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving

Abstract

Recent advances in driving-scene generation and reconstruction have demonstrated significant potential for enhancing autonomous driving systems by producing scalable and controllable training data. Existing generation methods primarily focus on synthesizing diverse and high-fidelity driving videos; however, due to limited 3D consistency and sparse viewpoint coverage, they struggle to support convenient and high-quality novel-view synthesis (NVS). Conversely, recent 3D/4D reconstruction approaches have significantly improved NVS for real-world driving scenes, yet inherently lack generative capabilities. To overcome this dilemma between scene generation and reconstruction, we propose \textbf{WorldSplat}, a novel feed-forward framework for 4D driving-scene generation. Our approach effectively generates consistent multi-track videos through two key steps: ((i)) We introduce a 4D-aware latent diffusion model integrating multi-modal information to produce pixel-aligned 4D Gaussians in a feed-forward manner. ((ii)) Subsequently, we refine the novel view videos rendered from these Gaussians using a enhanced video diffusion model. Extensive experiments conducted on benchmark datasets demonstrate that \textbf{WorldSplat} effectively generates high-fidelity, temporally and spatially consistent multi-track novel view driving videos.

Cite

Text

Zhu et al. "WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving." International Conference on Learning Representations, 2026.

Markdown

[Zhu et al. "WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhu2026iclr-worldsplat/)

BibTeX

@inproceedings{zhu2026iclr-worldsplat,
  title     = {{WorldSplat: Gaussian-Centric Feed-Forward 4D Scene Generation for Autonomous Driving}},
  author    = {Zhu, Ziyue and Wu, Zhanqian and Zhu, Zhenxin and Zhou, Lijun and Sun, Haiyang and Wang, Bing and Ma, Kun and Chen, Guang and Ye, Hangjun and Xie, Jin and Yang, Jian},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhu2026iclr-worldsplat/}
}