SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
Abstract
We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.
Cite
Text
Bokhovkin et al. "SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00067Markdown
[Bokhovkin et al. "SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/bokhovkin2025cvpr-scenefactor/) doi:10.1109/CVPR52734.2025.00067BibTeX
@inproceedings{bokhovkin2025cvpr-scenefactor,
title = {{SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation}},
author = {Bokhovkin, Aleksey and Meng, Quan and Tulsiani, Shubham and Dai, Angela},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {628-639},
doi = {10.1109/CVPR52734.2025.00067},
url = {https://mlanthology.org/cvpr/2025/bokhovkin2025cvpr-scenefactor/}
}