FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Abstract

Recent approaches such as ControlNet offer users fine-grained spatial control over text-to-image (T2I) diffusion models. However auxiliary modules have to be trained for each spatial condition type model architecture and checkpoint putting them at odds with the diverse intents and preferences a human designer would like to convey to the AI models during the content creation process. In this work we present FreeControl a training-free approach for controllable T2I generation that supports multiple conditions architectures and checkpoints simultaneously. FreeControl enforces structure guidance to facilitate the global alignment with a guidance image and appearance guidance to collect visual details from images generated without control. Extensive qualitative and quantitative experiments demonstrate the superior performance of FreeControl across a variety of pre-trained T2I models. In particular FreeControl enables convenient training-free control over many different architectures and checkpoints allows the challenging input conditions on which most of the existing training-free methods fail and achieves competitive synthesis quality compared to training-based approaches. Project page:https://genforce.github.io/freecontrol/.

Cite

Text

Mo et al. "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00713

Markdown

[Mo et al. "FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/mo2024cvpr-freecontrol/) doi:10.1109/CVPR52733.2024.00713

BibTeX

@inproceedings{mo2024cvpr-freecontrol,
  title     = {{FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition}},
  author    = {Mo, Sicheng and Mu, Fangzhou and Lin, Kuan Heng and Liu, Yanli and Guan, Bochen and Li, Yin and Zhou, Bolei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7465-7475},
  doi       = {10.1109/CVPR52733.2024.00713},
  url       = {https://mlanthology.org/cvpr/2024/mo2024cvpr-freecontrol/}
}