InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning

Shi, Jing; Xiong, Wei; Lin, Zhe; Jung, Hyun Joon

doi:10.1109/CVPR52733.2024.00816

InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning

Jing Shi, Wei Xiong, Zhe Lin, Hyun Joon Jung

CVPR 2024 pp. 8543-8552

doi:10.1109/CVPR52733.2024.00816 /cvpr/2024/shi2024cvpr-instantbooth/

Abstract

Recent advances in personalized image generation have enabled pre-trained text-to-image models to learn new concepts from specific image sets. However these methods often necessitate extensive test-time finetuning for each new concept leading to inefficiencies in both time and scalability. To address this challenge we introduce InstantBooth an innovative approach leveraging existing text-to-image models for instantaneous text-guided image personalization eliminating the need for test-time finetuning. This efficiency is achieved through two primary innovations. Firstly we utilize an image encoder that transforms input images into a global embedding to grasp the general concept. Secondly we integrate new adapter layers into the pre-trained model enhancing its ability to capture intricate identity details while maintaining language coherence. Significantly our model is trained exclusively on text-image pairs without reliance on concept-specific paired images. When benchmarked against existing finetuning-based personalization techniques like DreamBooth and Textual-Inversion InstantBooth not only shows comparable proficiency in aligning language with image maintaining image quality and preserving identity but also boasts a 100-fold increase in processing speed.

PDF CVPR Semantic Scholar

Cite

Text

Shi et al. "InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00816

Markdown

[Shi et al. "InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/shi2024cvpr-instantbooth/) doi:10.1109/CVPR52733.2024.00816

BibTeX

@inproceedings{shi2024cvpr-instantbooth,
  title     = {{InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning}},
  author    = {Shi, Jing and Xiong, Wei and Lin, Zhe and Jung, Hyun Joon},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8543-8552},
  doi       = {10.1109/CVPR52733.2024.00816},
  url       = {https://mlanthology.org/cvpr/2024/shi2024cvpr-instantbooth/}
}