Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
Abstract
Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based generation models, the visual structure and details of the object are often unexpectedly changed during the diffusion process. One major reason is that these diffusion-based approaches typically adopt a simple reconstruction objective during training, which can hardly enforce appropriate structural consistency between the generated and the reference images. To this end, in this paper, we design a novel reinforcement learning framework by utilizing the deterministic policy gradient method for personalized text-to-image generation, with which various objectives, differential or even non-differential, can be easily incorporated to supervise the diffusion models to improve the quality of the generated images. Experimental results on personalized text-to-image generation benchmark datasets demonstrate that our proposed approach outperforms existing state-of-the-art methods by a large margin on visual fidelity while maintaining text-alignment. Our code is available at: https://github.com/wfanyue/DPG-T2I-Personalization.
Cite
Text
Wei et al. "Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73383-3_23Markdown
[Wei et al. "Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/wei2024eccv-powerful/) doi:10.1007/978-3-031-73383-3_23BibTeX
@inproceedings{wei2024eccv-powerful,
title = {{Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning}},
author = {Wei, Fanyue and Zeng, Wei and Li, Zhenyang and Yin, Dawei and Duan, Lixin and Li, Wen},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73383-3_23},
url = {https://mlanthology.org/eccv/2024/wei2024eccv-powerful/}
}