Language-Guided Traffic Simulation via Scene-Level Diffusion

Abstract

Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.

Cite

Text

Zhong et al. "Language-Guided Traffic Simulation via Scene-Level Diffusion." Conference on Robot Learning, 2023.

Markdown

[Zhong et al. "Language-Guided Traffic Simulation via Scene-Level Diffusion." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/zhong2023corl-languageguided/)

BibTeX

@inproceedings{zhong2023corl-languageguided,
  title     = {{Language-Guided Traffic Simulation via Scene-Level Diffusion}},
  author    = {Zhong, Ziyuan and Rempe, Davis and Chen, Yuxiao and Ivanovic, Boris and Cao, Yulong and Xu, Danfei and Pavone, Marco and Ray, Baishakhi},
  booktitle = {Conference on Robot Learning},
  year      = {2023},
  pages     = {144-177},
  volume    = {229},
  url       = {https://mlanthology.org/corl/2023/zhong2023corl-languageguided/}
}