COME: Adding Scene-Centric Forecasting Control to Occupancy World Model

Abstract

World models are critical for autonomous driving to simulate environmental dynamics and generate synthetic data. Existing methods struggle to disentangle ego-vehicle motion (perspective shifts) from scene evolvement (agent interactions), leading to suboptimal predictions. Instead, we propose to separate environmental changes from ego-motion by leveraging the scene-centric coordinate systems. In this paper, we introduce COME: a framework that integrates scene-centric forecasting Control into the Occupancy world ModEl. Specifically, COME first generates ego-irrelevant, spatially consistent future features through a scene-centric prediction branch, which are then converted into scene condition using a tailored ControlNet. These condition features are subsequently injected into the occupancy world model, enabling more accurate and controllable future occupancy predictions. Experimental results on the nuScenes-Occ3D dataset show that COME achieves consistent and significant improvements over state-of-the-art (SOTA) methods across diverse configurations, including different input sources (ground-truth, camera-based, fusion-based occupancy) and prediction horizons (3s and 8s). For example, under the same settings, COME achieves 26.3% better mIoU metric than DOME and 23.7% better mIoU metric than UniScene. These results highlight the efficacy of disentangled representation learning in enhancing spatio-temporal prediction fidelity for world models. Code is available at https://github.com/synsin0/COME.

Cite

Text

Shi et al. "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shi et al. "COME: Adding Scene-Centric Forecasting Control to Occupancy World Model." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shi2025neurips-come/)

BibTeX

@inproceedings{shi2025neurips-come,
  title     = {{COME: Adding Scene-Centric Forecasting Control to Occupancy World Model}},
  author    = {Shi, Yining and Jiang, Kun and Meng, Qiang and Wang, Ke and Wang, Jiabao and Sun, Wenchao and Wen, Tuopu and Yang, Mengmeng and Yang, Diange},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shi2025neurips-come/}
}