Language-Guided Traffic Simulation via Scene-Level Diffusion
Abstract
Realistic and controllable traffic simulation is a core capability that is necessary to accelerate autonomous vehicle (AV) development. However, current approaches for controlling learning-based traffic models require significant domain expertise and are difficult for practitioners to use. To remedy this, we present CTG++, a scene-level conditional diffusion model that can be guided by language instructions. Developing this requires tackling two challenges: the need for a realistic and controllable traffic model backbone, and an effective method to interface with a traffic model using language. To address these challenges, we first propose a scene-level diffusion model equipped with a spatio-temporal transformer backbone, which generates realistic and controllable traffic. We then harness a large language model (LLM) to convert a user’s query into a loss function, guiding the diffusion model towards query-compliant generation. Through comprehensive evaluation, we demonstrate the effectiveness of our proposed method in generating realistic, query-compliant traffic simulations.
Cite
Text
Zhong et al. "Language-Guided Traffic Simulation via Scene-Level Diffusion." Conference on Robot Learning, 2023.Markdown
[Zhong et al. "Language-Guided Traffic Simulation via Scene-Level Diffusion." Conference on Robot Learning, 2023.](https://mlanthology.org/corl/2023/zhong2023corl-languageguided/)BibTeX
@inproceedings{zhong2023corl-languageguided,
title = {{Language-Guided Traffic Simulation via Scene-Level Diffusion}},
author = {Zhong, Ziyuan and Rempe, Davis and Chen, Yuxiao and Ivanovic, Boris and Cao, Yulong and Xu, Danfei and Pavone, Marco and Ray, Baishakhi},
booktitle = {Conference on Robot Learning},
year = {2023},
pages = {144-177},
volume = {229},
url = {https://mlanthology.org/corl/2023/zhong2023corl-languageguided/}
}