Decoupled Textual Embeddings for Customized Image Generation

Abstract

Customized text-to-image generation, which aims to learn user-specified concepts with a few images, has drawn significant attention recently. However, existing methods usually suffer from overfitting issues and entangle the subject-unrelated information (e.g., background and pose) with the learned concept, limiting the potential to compose concept into new scenes. To address these issues, we propose the DETEX, a novel approach that learns the disentangled concept embedding for flexible customized text-to-image generation. Unlike conventional methods that learn a single concept embedding from the given images, our DETEX represents each image using multiple word embeddings during training, i.e., a learnable image-shared subject embedding and several image-specific subject-unrelated embeddings. To decouple irrelevant attributes (i.e., background and pose) from the subject embedding, we further present several attribute mappers that encode each image as several image-specific subject-unrelated embeddings. To encourage these unrelated embeddings to capture the irrelevant information, we incorporate them with corresponding attribute words and propose a joint training strategy to facilitate the disentanglement. During inference, we only use the subject embedding for image generation, while selectively using image-specific embeddings to retain image-specified attributes. Extensive experiments demonstrate that the subject embedding obtained by our method can faithfully represent the target concept, while showing superior editability compared to the state-of-the-art methods. Our code will be available at https://github.com/PrototypeNx/DETEX.

Cite

Text

Cai et al. "Decoupled Textual Embeddings for Customized Image Generation." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I2.27850

Markdown

[Cai et al. "Decoupled Textual Embeddings for Customized Image Generation." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/cai2024aaai-decoupled/) doi:10.1609/AAAI.V38I2.27850

BibTeX

@inproceedings{cai2024aaai-decoupled,
  title     = {{Decoupled Textual Embeddings for Customized Image Generation}},
  author    = {Cai, Yufei and Wei, Yuxiang and Ji, Zhilong and Bai, Jinfeng and Han, Hu and Zuo, Wangmeng},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {909-917},
  doi       = {10.1609/AAAI.V38I2.27850},
  url       = {https://mlanthology.org/aaai/2024/cai2024aaai-decoupled/}
}