Scene Graph-Grounded Image Generation
Abstract
With the beneft of explicit object-oriented reasoning capabilities of scene graphs, scene graph-to-image generation has made remarkable advancements in comprehending object coherence and interactive relations. Recent state-of-the-arts typically predict the scene layouts as an intermediate representation of a scene graph before synthesizing the image. Nevertheless, transforming a scene graph into an exact layout may restrict its representation capabilities, leading to discrepancies in interactive relationships (such as standing on, wearing, or covering) between the generated image and the input scene graph. In this paper, we propose a Scene Graph-Grounded Image Generation (SGG-IG) method to mitigate the above issues. Specifcally, to enhance the scene graph representation, we design a masked auto-encoder module and a relation embedding learning module to integrate structural knowledge and contextual information of the scene graph with a mask self-supervised manner. Subsequently, to bridge the scene graph with visual content, we introduce a spatial constraint and image-scene alignment constraint to capture the fne-grained visual correlation between the scene graph symbol representation and the corresponding image representation, thereby generating semantically consistent and high-quality images. Extensive experiments demonstrate the effectiveness of the method both quantitatively and qualitatively.
Cite
Text
Wang et al. "Scene Graph-Grounded Image Generation." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I7.32823Markdown
[Wang et al. "Scene Graph-Grounded Image Generation." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-scene/) doi:10.1609/AAAI.V39I7.32823BibTeX
@inproceedings{wang2025aaai-scene,
title = {{Scene Graph-Grounded Image Generation}},
author = {Wang, Fuyun and Zhang, Tong and Wang, Yuanzhi and Zhang, Xiaoya and Liu, Xin and Cui, Zhen},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {7646-7654},
doi = {10.1609/AAAI.V39I7.32823},
url = {https://mlanthology.org/aaai/2025/wang2025aaai-scene/}
}