Disentangled 3D Scene Generation with Layout Learning
Abstract
We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects can be discovered by finding parts of a 3D scene that, when rearranged spatially, still produce valid configurations of the same scene. Concretely, our method jointly optimizes multiple NeRFs—each representing its own object—along with a set of layouts that composite these objects into scenes. We then encourage these composited scenes to be in-distribution according to the image generator. We show that despite its simplicity, our approach successfully generates 3D scenes decomposed into individual objects, enabling new capabilities in text-to-3D content creation.
Cite
Text
Epstein et al. "Disentangled 3D Scene Generation with Layout Learning." International Conference on Machine Learning, 2024.Markdown
[Epstein et al. "Disentangled 3D Scene Generation with Layout Learning." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/epstein2024icml-disentangled/)BibTeX
@inproceedings{epstein2024icml-disentangled,
title = {{Disentangled 3D Scene Generation with Layout Learning}},
author = {Epstein, Dave and Poole, Ben and Mildenhall, Ben and Efros, Alexei A and Holynski, Aleksander},
booktitle = {International Conference on Machine Learning},
year = {2024},
pages = {12547-12559},
volume = {235},
url = {https://mlanthology.org/icml/2024/epstein2024icml-disentangled/}
}