DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking
Abstract
The advent of Vision-Language Models (VLMs) has significantly advanced end-to-end autonomous driving, demonstrating powerful reasoning abilities for high-level behavior planning tasks. However, existing methods are often constrained by a passive perception paradigm, relying solely on text-based reasoning. This passivity restricts the model’s capacity to actively seek crucial visual evidence when faced with uncertainty. To address this, we introduce DriveAgent-R1, an autonomous driving agent capable of active perception for planning. In complex scenarios, DriveAgent-R1 proactively invokes tools to perform visual reasoning, firmly grounding its decisions in visual evidence, thereby enhancing both interpretability and reliability. Furthermore, we propose a hybrid thinking framework, inspired by human driver cognitive patterns, allowing the agent to adaptively switch between efficient text-only reasoning and robust tool-augmented visual reasoning based on scene complexity. This capability is cultivated through a three-stage progressive training strategy, featuring a core Cascaded Reinforcement Learning (Cascaded RL) phase. Extensive experiments on the Drive-Internal dataset, which is rich in long-tail scenarios, and the public nuScenes dataset show that, with only 3B parameters, DriveAgent-R1 achieves competitive performance comparable to top closed model systems such as GPT-5 and to human driving proficiency while remaining deployment-friendly, offering a proven path toward building more intelligent autonomous driving systems.
Cite
Text
Zheng et al. "DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking." International Conference on Learning Representations, 2026.Markdown
[Zheng et al. "DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zheng2026iclr-driveagentr1/)BibTeX
@inproceedings{zheng2026iclr-driveagentr1,
title = {{DriveAgent-R1: Advancing VLM-Based Autonomous Driving with Active Perception and Hybrid Thinking}},
author = {Zheng, Weicheng and Mao, Xiaofei and Ye, Nanfei and Li, Pengxiang and Zhan, Kun and Lang, XianPeng and Zhao, Hang},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zheng2026iclr-driveagentr1/}
}