On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent

Abstract

The development of autonomous driving technology depends on merging perception, decision, and control systems. Traditional strategies have struggled to understand complex driving environments and other road users' intent. This bottleneck, especially in constructing common sense reasoning and nuanced scene understanding, affects the safe and reliable operations of autonomous vehicles. The introduction of Visual Language Models (VLM) opens up possibilities for fully autonomous driving. This report evaluates the potential of GPT-4V(ision), the latest state-of-the-art VLM, as an autonomous driving agent. The evaluation primarily assesses the model's ultimate ability to act as a driving agent under varying conditions, while also considering its capacity to understand driving scenes and make decisions. Findings show that GPT-4V outperforms existing systems in scene understanding and causal reasoning. It has the potential in handling unexpected scenarios, understanding intentions, and making informed decisions. However, limitations remain in direction determination, traffic light recognition, vision grounding, and spatial reasoning tasks, highlighting the need for further research. The project is now available on GitHub for interested parties to access and utilize: https://github.com/PJLab-ADG/GPT4V-AD-Exploration.

Cite

Text

Wen et al. "On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent." ICLR 2024 Workshops: LLMAgents, 2024.

Markdown

[Wen et al. "On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent." ICLR 2024 Workshops: LLMAgents, 2024.](https://mlanthology.org/iclrw/2024/wen2024iclrw-road/)

BibTeX

@inproceedings{wen2024iclrw-road,
  title     = {{On the Road with GPT-4V(ision): Explorations of Utilizing Visual-Language Model as Autonomous Driving Agent}},
  author    = {Wen, Licheng and Yang, Xuemeng and Fu, Daocheng and Wang, Xiaofeng and Cai, Pinlong and Li, Xin and Ma, Tao and Li, Yingxuan and Xu, Linran and Shang, Dengke and Zhu, Zheng and Sun, Shaoyan and Bai, Yeqi and Cai, Xinyu and Dou, Min and Hu, Shuanglu and Shi, Botian and Qiao, Yu},
  booktitle = {ICLR 2024 Workshops: LLMAgents},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/wen2024iclrw-road/}
}