Decoupled Diffusion Sparks Adaptive Scene Generation

Abstract

Controllable scene generation could reduce the cost of diverse data collection substantially for autonomous driving. Prior works formulate the traffic layout generation as a predictive progress, either by denoising entire sequences at once or by iteratively predicting the next frame. However, full sequence denoising hinders online reaction, while the latter's short-sighted next-frame prediction lacks precise goal-state guidance. Further, the learned model struggles to generate complex or challenging scenarios due to a large number of safe and ordinary driving behaviors from open datasets. To overcome these, we introduce Nexus, a decoupled scene generation framework that improves reactivity and goal conditioning by simulating both ordinal and challenging scenarios from fine-grained tokens with independent noise states. At the core of the decoupled pipeline is the integration of a partial noise-masking training strategy and a noise-aware schedule that ensures timely environmental updates throughout the denoising process. To complement challenging scenario generation, we collect a dataset consisting of complex corner cases. It covers 540 hours of simulated data, including high-risk interactions such as cut-in, sudden braking, and collision. Nexus achieves superior generation realism while preserving reactivity and goal orientation, with a 40% reduction in displacement error. We further demonstrate that Nexus improves closed-loop planning by 20% through data augmentation and showcase its capability in safety-critical data generation.

Cite

Text

Zhou et al. "Decoupled Diffusion Sparks Adaptive Scene Generation." International Conference on Computer Vision, 2025.

Markdown

[Zhou et al. "Decoupled Diffusion Sparks Adaptive Scene Generation." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhou2025iccv-decoupled/)

BibTeX

@inproceedings{zhou2025iccv-decoupled,
  title     = {{Decoupled Diffusion Sparks Adaptive Scene Generation}},
  author    = {Zhou, Yunsong and Ye, Naisheng and Ljungbergh, William and Li, Tianyu and Yang, Jiazhi and Yang, Zetong and Zhu, Hongzi and Petersson, Christoffer and Li, Hongyang},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {27760-27770},
  url       = {https://mlanthology.org/iccv/2025/zhou2025iccv-decoupled/}
}