CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting
Abstract
We propose , a method for predicting future 3D scenes given past observations. Our method maps 2D ego-centric images to a distribution over plausible 3D latent scene configurations and predicts the evolution of hypothesized scenes through time. Our latents condition a global Neural Radiance Field (NeRF) to represent a 3D scene model, enabling explainable predictions and straightforward downstream planning. This approach models the world as a POMDP and considers complex scenarios of uncertainty in environmental states and dynamics. Specifically, we employ a two-stage training of Pose-Conditional-VAE and NeRF to learn 3D representations, and auto-regressively predict latent scene representations utilizing a mixture density network. We demonstrate the utility of our method in scenarios using the CARLA driving simulator, where enables efficient trajectory and contingency planning in complex multi-agent autonomous driving scenarios involving occlusions. Video and code are available at: www.carff.website.
Cite
Text
Yang et al. "CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73024-5_14Markdown
[Yang et al. "CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/yang2024eccv-carff/) doi:10.1007/978-3-031-73024-5_14BibTeX
@inproceedings{yang2024eccv-carff,
title = {{CARFF: Conditional Auto-Encoded Radiance Field for 3D Scene Forecasting}},
author = {Yang, Jiezhi and Desai, Khushi P and Packer, Charles and Bhatia, Harshil and Rhinehart, Nicholas and McAllister, Rowan and Gonzalez, Joseph E},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73024-5_14},
url = {https://mlanthology.org/eccv/2024/yang2024eccv-carff/}
}