SCube: Instant Large-Scale Scene Reconstruction Using VoxSplats

Abstract

We present SCube, a novel method for reconstructing large-scale 3D scenes (geometry, appearance, and semantics) from a sparse set of posed images. Our method encodes reconstructed scenes using a novel representation VoxSplat, which is a set of 3D Gaussians supported on a high-resolution sparse-voxel scaffold. To reconstruct a VoxSplat from images, we employ a hierarchical voxel latent diffusion model conditioned on the input images followed by a feedforward appearance prediction model. The diffusion model generates high-resolution grids progressively in a coarse-to-fine manner, and the appearance network predicts a set of Gaussians within each voxel. From as few as 3 non-overlapping input images, SCube can generate millions of Gaussians with a 10243 voxel grid spanning hundreds of meters in 20 seconds. Past works tackling scene reconstruction from images either rely on per-scene optimization and fail to reconstruct the scene away from input views (thus requiring dense view coverage as input) or leverage geometric priors based on low-resolution models, which produce blurry results. In contrast, SCube leverages high-resolution sparse networks and produces sharp outputs from few views. We show the superiority of SCube compared to prior art using the Waymo self-driving dataset on 3D reconstruction and demonstrate its applications, such as LiDAR simulation and text-to-scene generation.

Cite

Text

Ren et al. "SCube: Instant Large-Scale Scene Reconstruction Using VoxSplats." Neural Information Processing Systems, 2024. doi:10.52202/079017-3099

Markdown

[Ren et al. "SCube: Instant Large-Scale Scene Reconstruction Using VoxSplats." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/ren2024neurips-scube/) doi:10.52202/079017-3099

BibTeX

@inproceedings{ren2024neurips-scube,
  title     = {{SCube: Instant Large-Scale Scene Reconstruction Using VoxSplats}},
  author    = {Ren, Xuanchi and Lu, Yifan and Liang, Hanxue and Wu, Zhangjie and Ling, Huan and Chen, Mike and Fidler, Sanja and Williams, Francis and Huang, Jiahui},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3099},
  url       = {https://mlanthology.org/neurips/2024/ren2024neurips-scube/}
}