Bird's-Eye-View Informed Reasoning Driver

Wang, Yinuo; Tan, Mining; Zhong, Yuanxin; Zhitao, Wang; Cheng, Siyuan

Bird's-Eye-View Informed Reasoning Driver

Yinuo Wang, Mining Tan, Yuanxin Zhong, Wang Zhitao, Siyuan Cheng

ICLR 2026

/iclr/2026/wang2026iclr-bird/

Abstract

Motion planning in complex environments remains a core challenge for autonomous driving. While existing rule-based or imitation learning-based motion planning methods perform well in common scenarios, they often struggle with complex, long-tail scenarios. To address this problem, we introduce the Bird's-eye-view Informed Reasoning Driver (BIRDriver), a hierarchical framework that combines a Vision-Language Model (VLM) with a motion planner. BIRDriver leverages the commonsense reasoning capabilities of the VLM to effectively handle these challenging long-tail scenarios. Unlike prior methods that require domain-specific encoders and costly alignment, our approach compresses the environment into a single-frame bird's-eye-view (BEV) map, a paradigm that enables the model to fully leverage its knowledge from internet-scale pre-training. It then generates high-level key points, which are encoded and passed to the motion planner to produce the final trajectory. However, a major challenge is that standard VLMs struggle to generate the precise numerical coordinates required for such key points. We address this limitation by fine-tuning them on a composite dataset of three auxiliary types to enhance spatial localization, scene understanding, and key-point generation, complemented by a token-level weighted mechanism for improved numerical precision. Experiments on the nuPlan dataset demonstrate that BIRDriver outperforms the base motion planner in most cases on both Test14-hard and Test14-random benchmarks, and achieves state-of-the-art (SOTA) performance on the InterPlan long-tail benchmark.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "Bird's-Eye-View Informed Reasoning Driver." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "Bird's-Eye-View Informed Reasoning Driver." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-bird/)

BibTeX

@inproceedings{wang2026iclr-bird,
  title     = {{Bird's-Eye-View Informed Reasoning Driver}},
  author    = {Wang, Yinuo and Tan, Mining and Zhong, Yuanxin and Zhitao, Wang and Cheng, Siyuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-bird/}
}