EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion

Abstract

We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Our code and models are open-sourced.

Cite

Text

Zhai et al. "EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72664-4_10

Markdown

[Zhai et al. "EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/zhai2024eccv-echoscene/) doi:10.1007/978-3-031-72664-4_10

BibTeX

@inproceedings{zhai2024eccv-echoscene,
  title     = {{EchoScene: Indoor Scene Generation via Information Echo over Scene Graph Diffusion}},
  author    = {Zhai, Guangyao and Örnek, Evin Pınar and Chen, Dave Zhenyu and Liao, Ruotong and Di, Yan and Navab, Nassir and Tombari, Federico and Busam, Benjamin},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72664-4_10},
  url       = {https://mlanthology.org/eccv/2024/zhai2024eccv-echoscene/}
}