FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Abstract
Controllability plays a crucial role in the practical applications of 3D indoor scene synthesis. Existing works either allow rough language-based control, that is convenient but lacks fine-grained scene customization, or employ graph-based control, which offers better controllability but demands considerable knowledge for the cumbersome graph design process. To address these challenges, we present FreeScene, a user-friendly framework that enables both convenient and effective control for indoor scene synthesis. Specifically, FreeScene supports free-form user inputs including text description and/or reference images, allowing users to express versatile design intentions. The user inputs are adequately analyzed and integrated into a graph representation by a VLM-based Graph Designer. We then propose MG-DiT, a Mixed Graph Diffusion Transformer, which performs graph-aware denoising to enhance scene generation. Our MG-DiT not only excels at preserving graph structure but also offers broad applicability to various tasks, including, but not limited to, text-to-scene, graph-to-scene, and rearrangement, all within a single model. Extensive experiments demonstrate that FreeScene provides an efficient and user-friendly solution that unifies text-based and graph-based scene synthesis, outperforming state-of-the-art methods in terms of both generation quality and controllability in a range of applications.
Cite
Text
Bai et al. "FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.00553Markdown
[Bai et al. "FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/bai2025cvpr-freescene/) doi:10.1109/CVPR52734.2025.00553BibTeX
@inproceedings{bai2025cvpr-freescene,
title = {{FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts}},
author = {Bai, Tongyuan and Bai, Wangyuanfan and Chen, Dong and Wu, Tieru and Li, Manyi and Ma, Rui},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {5893-5903},
doi = {10.1109/CVPR52734.2025.00553},
url = {https://mlanthology.org/cvpr/2025/bai2025cvpr-freescene/}
}