LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration

Abstract

Text-to-image (T2I) generation has made remarkable progress, yet existing systems still lack intuitive control over spatial composition, object consistency, and multi-step editing. We present **LayerCraft**, a modular framework that uses large language models (LLMs) as autonomous agents to orchestrate structured, layered image generation and editing. LayerCraft supports two key capabilities: (1) *structured generation* from simple prompts via chain-of-thought (CoT) reasoning, enabling it to decompose scenes, reason about object placement, and guide composition in a controllable, interpretable manner; and (2) *layered object integration*, allowing users to insert and customize objects---such as characters or props---across diverse images or scenes while preserving identity, context, and style. The system comprises a coordinator agent, the **ChainArchitect** for CoT-driven layout planning, and the **Object Integration Network (OIN)** for seamless image editing using off-the-shelf T2I models without retraining. Through applications like batch collage editing and narrative scene generation, LayerCraft empowers non-experts to iteratively design, customize, and refine visual content with minimal manual effort. Code will be released upon acceptance.

Cite

Text

Zhang et al. "LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-layercraft/)

BibTeX

@inproceedings{zhang2025neurips-layercraft,
  title     = {{LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration}},
  author    = {Zhang, Yuyao and Li, Jinghao and Tai, Yu-Wing},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-layercraft/}
}