Language Prompt for Autonomous Driving

Abstract

A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands nuScenes dataset by constructing a total of 40,147 language descriptions, each referring to an average of 7.4 object tracklets. Based on the object-text pairs from the new benchmark, we formulate a novel prompt-based driving task, \ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack. Experiments show that our PromptTrack achieves impressive performance on NuPrompt. We hope this work can provide some new insights for the self-driving community.

Cite

Text

Wu et al. "Language Prompt for Autonomous Driving." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32902

Markdown

[Wu et al. "Language Prompt for Autonomous Driving." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wu2025aaai-language/) doi:10.1609/AAAI.V39I8.32902

BibTeX

@inproceedings{wu2025aaai-language,
  title     = {{Language Prompt for Autonomous Driving}},
  author    = {Wu, Dongming and Han, Wencheng and Liu, Yingfei and Wang, Tiancai and Xu, Cheng-Zhong and Zhang, Xiangyu and Shen, Jianbing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8359-8367},
  doi       = {10.1609/AAAI.V39I8.32902},
  url       = {https://mlanthology.org/aaai/2025/wu2025aaai-language/}
}