Bird's-Eye-View Informed Reasoning Driver
Abstract
Motion planning in complex environments remains a core challenge for autonomous driving. While existing rule-based or imitation learning-based motion planning methods perform well in common scenarios, they often struggle with complex, long-tail scenarios. To address this problem, we introduce the Bird's-eye-view Informed Reasoning Driver (BIRDriver), a hierarchical framework that combines a Vision-Language Model (VLM) with a motion planner. BIRDriver leverages the commonsense reasoning capabilities of the VLM to effectively handle these challenging long-tail scenarios. Unlike prior methods that require domain-specific encoders and costly alignment, our approach compresses the environment into a single-frame bird's-eye-view (BEV) map, a paradigm that enables the model to fully leverage its knowledge from internet-scale pre-training. It then generates high-level key points, which are encoded and passed to the motion planner to produce the final trajectory. However, a major challenge is that standard VLMs struggle to generate the precise numerical coordinates required for such key points. We address this limitation by fine-tuning them on a composite dataset of three auxiliary types to enhance spatial localization, scene understanding, and key-point generation, complemented by a token-level weighted mechanism for improved numerical precision. Experiments on the nuPlan dataset demonstrate that BIRDriver outperforms the base motion planner in most cases on both Test14-hard and Test14-random benchmarks, and achieves state-of-the-art (SOTA) performance on the InterPlan long-tail benchmark.
Cite
Text
Wang et al. "Bird's-Eye-View Informed Reasoning Driver." International Conference on Learning Representations, 2026.Markdown
[Wang et al. "Bird's-Eye-View Informed Reasoning Driver." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-bird/)BibTeX
@inproceedings{wang2026iclr-bird,
title = {{Bird's-Eye-View Informed Reasoning Driver}},
author = {Wang, Yinuo and Tan, Mining and Zhong, Yuanxin and Zhitao, Wang and Cheng, Siyuan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/wang2026iclr-bird/}
}