I-DRUID: Layout to Image Generation via Instance-Disentangled Representation and Unpaired Data
Abstract
Layout-to-Image (L2I) generation, aiming at coherently generating multiple instances conditioned on the given layouts and instance captions, has raised substantial attention in the recent research. The primary challenges of L2I stem from 1) attribute leakage due to the entangled instance features within attention and 2) limited generalization to novel scenes caused by insufficient image-text paired data. To address these issues, we propose I-DRUID, a novel framework that leverages instance-disentanglement representations (IDR) and unpaired data (UID) to improve L2I generation. IDR are extracted with our instance disentanglement modules, which utilizes information among instances to obtain semantic-related features while suppressing spurious parts. To facilitate disentangling, we require semantic-related features to trigger more accurate attention maps than spurious ones, formulating the instance-disentangled constraint to avoid attribute leakage. Moreover, to improve L2I generalization, we adapt L2I with unpaired, prompt-only data (UID) to novel scenes via reinforcement learning. Specifically, we enforce L2I model to learn from unpaired, prompt-only data by encouraging / rejecting the rational / implausible generation trajectories based on AI feedback, avoiding the need for paired data collection. Finally, our empirical observations show that IDM and RL cooperate synergistically to further enhance L2I accuracies. Extensive experiments demonstrate the efficacy of our method.
Cite
Text
Yang et al. "I-DRUID: Layout to Image Generation via Instance-Disentangled Representation and Unpaired Data." International Conference on Learning Representations, 2026.Markdown
[Yang et al. "I-DRUID: Layout to Image Generation via Instance-Disentangled Representation and Unpaired Data." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/yang2026iclr-idruid/)BibTeX
@inproceedings{yang2026iclr-idruid,
title = {{I-DRUID: Layout to Image Generation via Instance-Disentangled Representation and Unpaired Data}},
author = {Yang, Fengxiang and Zheng, Tianyi and Yin, Bangjie and Liu, Shice and Chen, Jinwei and Jiang, Peng-Tao and Li, Bo},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/yang2026iclr-idruid/}
}