Disentangled 3D Scene Generation with Layout Learning

Abstract

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs—each representing its own object—along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.

Cite

Text

Epstein et al. "Disentangled 3D Scene Generation with Layout Learning." International Conference on Machine Learning, 2024.

Markdown

[Epstein et al. "Disentangled 3D Scene Generation with Layout Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/epstein2024icml-disentangled/)

BibTeX

@inproceedings{epstein2024icml-disentangled,
  title     = {{Disentangled 3D Scene Generation with Layout Learning}},
  author    = {Epstein, Dave and Poole, Ben and Mildenhall, Ben and Efros, Alexei A and Holynski, Aleksander},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {12547-12559},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/epstein2024icml-disentangled/}
}