Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis

Abstract

We present Zero-Painter a novel training-free framework for layout-conditional text-to-image synthesis that facilitates the creation of detailed and controlled imagery from textual prompts. Our method utilizes object masks and individual descriptions coupled with a global text prompt to generate images with high fidelity. Zero-Painter employs a two-stage process involving our novel Prompt-Adjusted Cross-Attention (PACA) and Region-Grouped Cross-Attention (ReGCA) blocks ensuring precise alignment of generated objects with textual prompts and mask shapes. Our extensive experiments demonstrate that Zero-Painter surpasses current state-of-the-art methods in preserving textual details and adhering to mask shapes. We will make the codes and the models publicly available.

Cite

Text

Ohanyan et al. "Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00837

Markdown

[Ohanyan et al. "Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/ohanyan2024cvpr-zeropainter/) doi:10.1109/CVPR52733.2024.00837

BibTeX

@inproceedings{ohanyan2024cvpr-zeropainter,
  title     = {{Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis}},
  author    = {Ohanyan, Marianna and Manukyan, Hayk and Wang, Zhangyang and Navasardyan, Shant and Shi, Humphrey},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8764-8774},
  doi       = {10.1109/CVPR52733.2024.00837},
  url       = {https://mlanthology.org/cvpr/2024/ohanyan2024cvpr-zeropainter/}
}