Gaussian-Based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction

Abstract

In autonomous driving, accurately predicting occupancy and motion is crucial for safe navigation within dynamic environments. However, existing methods often suffer from difficulties in handling complex scenes and uncertainty arising from sensor data. To address these issues, we propose a new Gaussian-based World Model (GWM), seamlessly integrating raw multi-modal sensor inputs. In 1st stage, Gaussian representation learner utilizes self-supervised pretraining to learn robust Gaussian representation. Gaussian representation integrates semantic and geometric information and establishes a robust probabilistic understanding of the environment. In 2nd stage, GWM seamlessly integrates learning, simulation, and planning into a unified framework, empowering the uncertainty-aware simulator & planner to jointly forecast future scene evolutions and vehicle trajectories. Simulator generates future scene predictions by modeling both static and dynamic elements, while planner calculates optimal paths to minimize collision risks, thus enhancing navigation safety. Overall, GWM employs a sensor-to-planning world model that directly processes raw sensor data, setting it apart from previous methods. Experiments show that GWM outperforms state-of-the-art approaches by 1.46% in semantic comprehension and 0.07m in motion prediction. Moreover, we provide an in-depth analysis of Gaussian representations under complex scenarios.

Cite

Text

Feng et al. "Gaussian-Based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction." International Conference on Computer Vision, 2025.

Markdown

[Feng et al. "Gaussian-Based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/feng2025iccv-gaussianbased/)

BibTeX

@inproceedings{feng2025iccv-gaussianbased,
  title     = {{Gaussian-Based World Model: Gaussian Priors for Voxel-Based Occupancy Prediction and Future Motion Prediction}},
  author    = {Feng, Tuo and Wang, Wenguan and Yang, Yi},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {25239-25249},
  url       = {https://mlanthology.org/iccv/2025/feng2025iccv-gaussianbased/}
}