Enhancing End-to-End Autonomous Driving with Latent World Model

Abstract

In autonomous driving, end-to-end planners directly utilize raw sensor data, enabling them to extract richer scene features and reduce information loss compared to traditional planners. This raises a crucial research question: how can we develop better scene feature representations to fully leverage sensor data in end-to-end driving? Self-supervised learning methods show great success in learning rich feature representations in NLP and computer vision. Inspired by this, we propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving. LAW predicts future latent scene features based on current features and ego trajectories. This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks, improving scene feature learning while optimizing trajectory prediction. LAW achieves state-of-the-art performance across multiple benchmarks, including real-world open-loop benchmark nuScenes, NAVSIM, and simulator-based closed-loop benchmark CARLA. The code will be released.

Cite

Text

Li et al. "Enhancing End-to-End Autonomous Driving with Latent World Model." International Conference on Learning Representations, 2025.

Markdown

[Li et al. "Enhancing End-to-End Autonomous Driving with Latent World Model." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/li2025iclr-enhancing-b/)

BibTeX

@inproceedings{li2025iclr-enhancing-b,
  title     = {{Enhancing End-to-End Autonomous Driving with Latent World Model}},
  author    = {Li, Yingyan and Fan, Lue and He, Jiawei and Wang, Yuqi and Chen, Yuntao and Zhang, Zhaoxiang and Tan, Tieniu},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/li2025iclr-enhancing-b/}
}