NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Kim, Seung Wook; Brown, Bradley; Yin, Kangxue; Kreis, Karsten; Schwarz, Katja; Li, Daiqing; Rombach, Robin; Torralba, Antonio; Fidler, Sanja

doi:10.1109/CVPR52729.2023.00821

NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models

Seung Wook Kim, Bradley Brown, Kangxue Yin, Karsten Kreis, Katja Schwarz, Daiqing Li, Robin Rombach, Antonio Torralba, Sanja Fidler

CVPR 2023 pp. 8496-8506

doi:10.1109/CVPR52729.2023.00821 /cvpr/2023/kim2023cvpr-neuralfieldldm/

Abstract

Automatically generating high-quality real world 3D scenes is of enormous interest for applications such as virtual reality and robotics simulation. Towards this goal, we introduce NeuralField-LDM, a generative model capable of synthesizing complex 3D environments. We leverage Latent Diffusion Models that have been successfully utilized for efficient high-quality 2D content creation. We first train a scene auto-encoder to express a set of image and pose pairs as a neural field, represented as density and feature voxel grids that can be projected to produce novel views of the scene. To further compress this representation, we train a latent-autoencoder that maps the voxel grids to a set of latent representations. A hierarchical diffusion model is then fit to the latents to complete the scene generation pipeline. We achieve a substantial improvement over existing state-of-the-art scene generation models. Additionally, we show how NeuralField-LDM can be used for a variety of 3D content creation applications, including conditional scene generation, scene inpainting and scene style manipulation.

PDF CVPR Semantic Scholar

Cite

Text

Kim et al. "NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00821

Markdown

[Kim et al. "NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/kim2023cvpr-neuralfieldldm/) doi:10.1109/CVPR52729.2023.00821

BibTeX

@inproceedings{kim2023cvpr-neuralfieldldm,
  title     = {{NeuralField-LDM: Scene Generation with Hierarchical Latent Diffusion Models}},
  author    = {Kim, Seung Wook and Brown, Bradley and Yin, Kangxue and Kreis, Karsten and Schwarz, Katja and Li, Daiqing and Rombach, Robin and Torralba, Antonio and Fidler, Sanja},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {8496-8506},
  doi       = {10.1109/CVPR52729.2023.00821},
  url       = {https://mlanthology.org/cvpr/2023/kim2023cvpr-neuralfieldldm/}
}