Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models

Abstract

While there has been significant progress in customizing text-to-image generation models generating images that combine multiple personalized concepts remains challenging. In this work we introduce Concept Weaver a method for composing customized text-to-image diffusion models at inference time. Specifically the method breaks the process into two steps: creating a template image aligned with the semantics of input prompts and then personalizing the template using a concept fusion strategy. The fusion strategy incorporates the appearance of the target concepts into the template image while retaining its structural details. The results indicate that our method can generate multiple custom concepts with higher identity fidelity compared to alternative approaches. Furthermore the method is shown to seamlessly handle more than two concepts and closely follow the semantic meaning of the input prompt without blending appearances across different subjects.

Cite

Text

Kwon et al. "Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00848

Markdown

[Kwon et al. "Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/kwon2024cvpr-concept/) doi:10.1109/CVPR52733.2024.00848

BibTeX

@inproceedings{kwon2024cvpr-concept,
  title     = {{Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models}},
  author    = {Kwon, Gihyun and Jenni, Simon and Li, Dingzeyu and Lee, Joon-Young and Ye, Jong Chul and Heilbron, Fabian Caba},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8880-8889},
  doi       = {10.1109/CVPR52733.2024.00848},
  url       = {https://mlanthology.org/cvpr/2024/kwon2024cvpr-concept/}
}