MultiBooth: Towards Generating All Your Concepts in an Image from Text

Zhu, Chenyang; Li, Kai; Ma, Yue; He, Chunming; Li, Xiu

doi:10.1609/AAAI.V39I10.33187

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Chenyang Zhu, Kai Li, Yue Ma, Chunming He, Xiu Li

AAAI 2025 pp. 10923-10931

doi:10.1609/AAAI.V39I10.33187 /aaai/2025/zhu2025aaai-multibooth/

Abstract

This paper introduces MultiBooth, a method that generates images from texts containing various concepts from users. Despite diffusion models bringing significant advancements for customized text-to-image generation, existing methods often struggle with multi-concept scenarios due to low concept fidelity and high inference cost. MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. During the single-concept learning phase, we employ a multi-modal image encoder and an efficient concept encoding technique to learn a concise and discriminative representation for each concept. In the multi-concept integration phase, we use bounding boxes to define the generation area for each concept within the cross-attention map. This method enables the creation of individual concepts within their specified regions, thereby facilitating the formation of multi-concept images. This strategy not only improves concept fidelity but also reduces additional inference cost. MultiBooth surpasses various baselines in both qualitative and quantitative evaluations, showcasing its superior performance and computational efficiency.

PDF AAAI Semantic Scholar

Cite

Text

Zhu et al. "MultiBooth: Towards Generating All Your Concepts in an Image from Text." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I10.33187

Markdown

[Zhu et al. "MultiBooth: Towards Generating All Your Concepts in an Image from Text." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhu2025aaai-multibooth/) doi:10.1609/AAAI.V39I10.33187

BibTeX

@inproceedings{zhu2025aaai-multibooth,
  title     = {{MultiBooth: Towards Generating All Your Concepts in an Image from Text}},
  author    = {Zhu, Chenyang and Li, Kai and Ma, Yue and He, Chunming and Li, Xiu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {10923-10931},
  doi       = {10.1609/AAAI.V39I10.33187},
  url       = {https://mlanthology.org/aaai/2025/zhu2025aaai-multibooth/}
}