DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation
Abstract
World models have demonstrated superiority in autonomous driving, particularly in the generation of multi-view driving videos. However, significant challenges still exist in generating customized driving videos. In this paper, we propose DriveDreamer-2, which incorporates a Large Language Model (LLM) to facilitate the creation of user-defined driving videos. Specifically, a trajectory generation function library is developed to produce trajectories that conform to user descriptions. Subsequently, an HDMap generator is designed to learn the mapping from trajectories to road structures. Ultimately, we propose the Unified Multi-View Model (UniMVM) to enhance temporal and spatial coherence in the generated multi-view driving videos. To the best of our knowledge, DriveDreamer-2 is the first world model to generate customized driving videos, and it can generate uncommon driving videos (e.g., vehicles abruptly cut in) in a user-friendly manner. Besides, experimental results demonstrate that the generated videos enhance the training of driving perception methods (e.g., 3D detection and tracking). Furthermore, video generation quality of DriveDreamer-2 surpasses other state-of-the-art methods, showcasing FID and FVD scores of 11.2 and 55.7, representing relative improvements of ~30% and ~50%.
Cite
Text
Zhao et al. "DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33130Markdown
[Zhao et al. "DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhao2025aaai-drivedreamer/) doi:10.1609/AAAI.V39I10.33130BibTeX
@inproceedings{zhao2025aaai-drivedreamer,
title = {{DriveDreamer-2: LLM-Enhanced World Models for Diverse Driving Video Generation}},
author = {Zhao, Guosheng and Wang, Xiaofeng and Zhu, Zheng and Chen, Xinze and Huang, Guan and Bao, Xiaoyi and Wang, Xingang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {10412-10420},
doi = {10.1609/AAAI.V39I10.33130},
url = {https://mlanthology.org/aaai/2025/zhao2025aaai-drivedreamer/}
}