InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning
Abstract
Recent advances in personalized image generation have enabled pre-trained text-to-image models to learn new concepts from specific image sets. However these methods often necessitate extensive test-time finetuning for each new concept leading to inefficiencies in both time and scalability. To address this challenge we introduce InstantBooth an innovative approach leveraging existing text-to-image models for instantaneous text-guided image personalization eliminating the need for test-time finetuning. This efficiency is achieved through two primary innovations. Firstly we utilize an image encoder that transforms input images into a global embedding to grasp the general concept. Secondly we integrate new adapter layers into the pre-trained model enhancing its ability to capture intricate identity details while maintaining language coherence. Significantly our model is trained exclusively on text-image pairs without reliance on concept-specific paired images. When benchmarked against existing finetuning-based personalization techniques like DreamBooth and Textual-Inversion InstantBooth not only shows comparable proficiency in aligning language with image maintaining image quality and preserving identity but also boasts a 100-fold increase in processing speed.
Cite
Text
Shi et al. "InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00816Markdown
[Shi et al. "InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/shi2024cvpr-instantbooth/) doi:10.1109/CVPR52733.2024.00816BibTeX
@inproceedings{shi2024cvpr-instantbooth,
title = {{InstantBooth: Personalized Text-to-Image Generation Without Test-Time Finetuning}},
author = {Shi, Jing and Xiong, Wei and Lin, Zhe and Jung, Hyun Joon},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {8543-8552},
doi = {10.1109/CVPR52733.2024.00816},
url = {https://mlanthology.org/cvpr/2024/shi2024cvpr-instantbooth/}
}