DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking

Zheng, Weicheng; Mao, Xiaofei; Ye, Nanfei; Li, Pengxiang; Zhan, Kun; Lang, XianPeng; Zhao, Hang

DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking

Weicheng Zheng, Xiaofei Mao, Nanfei Ye, Pengxiang Li, Kun Zhan, XianPeng Lang, Hang Zhao

ICLR 2026

/iclr/2026/zheng2026iclr-driveagentr1/

Abstract

The advent of Vision-Language Models (VLMs) has significantly advanced end-to-end autonomous driving, demonstrating powerful reasoning abilities for high-level behavior planning tasks. However, existing methods are often constrained by a passive perception paradigm, relying solely on text-based reasoning. This passivity restricts the model’s capacity to actively seek crucial visual evidence when faced with uncertainty. To address this, we introduce DriveAgent-R1, an autonomous driving agent capable of active perception for planning. In complex scenarios, DriveAgent-R1 proactively invokes tools to perform visual reasoning, firmly grounding its decisions in visual evidence, thereby enhancing both interpretability and reliability. Furthermore, we propose a hybrid thinking framework, inspired by human driver cognitive patterns, allowing the agent to adaptively switch between efficient text-only reasoning and robust tool-augmented visual reasoning based on scene complexity. This capability is cultivated through a three-stage progressive training strategy, featuring a core Cascaded Reinforcement Learning (Cascaded RL) phase. Extensive experiments on the Drive-Internal dataset, which is rich in long-tail scenarios, and the public nuScenes dataset show that, with only 3B parameters, DriveAgent-R1 achieves competitive performance comparable to top closed model systems such as GPT-5 and to human driving proficiency while remaining deployment-friendly, offering a proven path toward building more intelligent autonomous driving systems.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zheng et al. "DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking." International Conference on Learning Representations, 2026.

Markdown

[Zheng et al. "DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zheng2026iclr-driveagentr1/)

BibTeX

@inproceedings{zheng2026iclr-driveagentr1,
  title     = {{DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking}},
  author    = {Zheng, Weicheng and Mao, Xiaofei and Ye, Nanfei and Li, Pengxiang and Zhan, Kun and Lang, XianPeng and Zhao, Hang},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zheng2026iclr-driveagentr1/}
}