DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving

Abstract

World models, especially in autonomous driving, are trending and drawing extensive attention due to their capacity for comprehending driving environments. The established world model holds immense potential for the generation of high-quality driving videos, and driving policies for safe maneuvering. However, a critical limitation in relevant research lies in its predominant focus on gaming environments or simulated settings, thereby lacking the representation of real-world driving scenarios. Therefore, we introduce DriveDreamer, a pioneering world model entirely derived from real-world driving scenarios. Regarding that modeling the world in intricate driving scenes entails an overwhelming search space, we propose harnessing the powerful diffusion model to construct a comprehensive representation of the complex environment. Furthermore, we introduce a two-stage training pipeline. In the initial phase, DriveDreamer acquires a deep understanding of structured traffic constraints, while the subsequent stage equips it with the ability to anticipate future states. Extensive experiments are conducted to verify that DriveDreamer empowers both driving video generation and action prediction, faithfully capturing real-world traffic constraints. Furthermore, videos generated by DriveDreamer significantly enhance the training of driving perception methods.

Cite

Text

Wang et al. "DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73195-2_4

Markdown

[Wang et al. "DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wang2024eccv-drivedreamer/) doi:10.1007/978-3-031-73195-2_4

BibTeX

@inproceedings{wang2024eccv-drivedreamer,
  title     = {{DriveDreamer: Towards Real-World-Driven World Models for Autonomous Driving}},
  author    = {Wang, Xiaofeng and Zhu, Zheng and Huang, Guan and Xinze, Chen and Zhu, Jiagang and Lu, Jiwen},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73195-2_4},
  url       = {https://mlanthology.org/eccv/2024/wang2024eccv-drivedreamer/}
}