InstructBooth: Instruction-Following Personalized Text-to-Image Generation
Abstract
Personalizing text-to-image models using a limited set of images for a specific object has been explored in subject-specific image generation. However, existing methods often face challenges in aligning with text prompts due to overfitting to the limited training images. In this work, we introduce InstructBooth, a novel method designed to enhance image-text alignment in personalized text-to-image models without sacrificing the personalization ability. Our approach first personalizes text-to-image models with a small number of subject-specific images using a unique identifier. After personalization, we fine-tune personalized text-to-image models using reinforcement learning to maximize a reward that quantifies image-text alignment. Additionally, we propose complementary techniques to increase the synergy between these two processes. Our method demonstrates superior image-text alignment compared to existing baselines, while maintaining high personalization ability. In human evaluations, InstructBooth outperforms them when considering all comprehensive factors.
Cite
Text
Chae et al. "InstructBooth: Instruction-Following Personalized Text-to-Image Generation." ICML 2024 Workshops: FM-Wild, 2024.Markdown
[Chae et al. "InstructBooth: Instruction-Following Personalized Text-to-Image Generation." ICML 2024 Workshops: FM-Wild, 2024.](https://mlanthology.org/icmlw/2024/chae2024icmlw-instructbooth/)BibTeX
@inproceedings{chae2024icmlw-instructbooth,
title = {{InstructBooth: Instruction-Following Personalized Text-to-Image Generation}},
author = {Chae, Daewon and Park, Nokyung and Kim, Jinkyu and Lee, Kimin},
booktitle = {ICML 2024 Workshops: FM-Wild},
year = {2024},
url = {https://mlanthology.org/icmlw/2024/chae2024icmlw-instructbooth/}
}