Ctrl-X: Controlling Structure and Appearance for Text-to-Image Generation Without Guidance

Abstract

Recent controllable generation approaches such as FreeControl and Diffusion Self-Guidance bring fine-grained spatial and appearance control to text-to-image (T2I) diffusion models without training auxiliary modules. However, these methods optimize the latent embedding for each type of score function with longer diffusion steps, making the generation process time-consuming and limiting their flexibility and use. This work presents Ctrl-X, a simple framework for T2I diffusion controlling structure and appearance without additional training or guidance. Ctrl-X designs feed-forward structure control to enable the structure alignment with a structure image and semantic-aware appearance transfer to facilitate the appearance transfer from a user-input image. Extensive qualitative and quantitative experiments illustrate the superior performance of Ctrl-X on various condition inputs and model checkpoints. In particular, Ctrl-X supports novel structure and appearance control with arbitrary condition images of any modality, exhibits superior image quality and appearance transfer compared to existing works, and provides instant plug-and-play functionality to any T2I and text-to-video (T2V) diffusion model. See our project page for the code and an overview of the results: https://genforce.github.io/ctrl-x

Cite

Text

Lin et al. "Ctrl-X: Controlling Structure and Appearance for Text-to-Image Generation Without Guidance." Neural Information Processing Systems, 2024. doi:10.52202/079017-4095

Markdown

[Lin et al. "Ctrl-X: Controlling Structure and Appearance for Text-to-Image Generation Without Guidance." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lin2024neurips-ctrlx/) doi:10.52202/079017-4095

BibTeX

@inproceedings{lin2024neurips-ctrlx,
  title     = {{Ctrl-X: Controlling Structure and Appearance for Text-to-Image Generation Without Guidance}},
  author    = {Lin, Kuan Heng and Mo, Sicheng and Klingher, Ben and Mu, Fangzhou and Zhou, Bolei},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4095},
  url       = {https://mlanthology.org/neurips/2024/lin2024neurips-ctrlx/}
}