World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving

Abstract

The Multi-modal Large Language Models (MLLMs) with extensive world knowledge have revitalized autonomous driving, particularly in reasoning tasks within perceivable regions. However, when faced with perception-limited areas (dynamic or static occlusion regions), MLLMs struggle to effectively integrate perception ability with world knowledge for reasoning. These perception-limited regions can conceal crucial safety information, especially for vulnerable road users. In this paper, we propose a framework, which aims to improve autonomous driving performance under perception-limited conditions by enhancing the integration of perception capabilities and world knowledge. Specifically, we propose a plug-and-play instruction-guided interaction module that bridges modality gaps and significantly reduces the input sequence length, allowing it to adapt effectively to multi-view video inputs. Furthermore, to better integrate world knowledge with driving-related tasks, we have collected and refined a large-scale multi-modal dataset that includes 2 million natural language QA pairs, 1.7 million grounding task data. To evaluate the model’s utilization of world knowledge, we introduce an object-level risk assessment dataset comprising 200K QA pairs, where the questions necessitate multi-step reasoning leveraging world knowledge for resolution. Extensive experiments validate the effectiveness of our proposed method.

Cite

Text

Zhai et al. "World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I9.33067

Markdown

[Zhai et al. "World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhai2025aaai-world/) doi:10.1609/AAAI.V39I9.33067

BibTeX

@inproceedings{zhai2025aaai-world,
  title     = {{World Knowledge-Enhanced Reasoning Using Instruction-Guided Interactor in Autonomous Driving}},
  author    = {Zhai, Mingliang and Li, Cheng and Guo, Zengyuan and Yang, Ningrui and Qin, Xiameng and Zhao, Sanyuan and Han, Junyu and Tao, Ji and Wu, Yuwei and Jia, Yunde},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {9842-9850},
  doi       = {10.1609/AAAI.V39I9.33067},
  url       = {https://mlanthology.org/aaai/2025/zhai2025aaai-world/}
}