Parrot: Pareto-Optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Abstract

Recent works have demonstrated that using reinforcement learning (RL) with multiple quality rewards can improve the quality of generated images in text-to-image (T2I) generation. However, manually adjusting reward weights poses challenges and may cause over-optimization in certain metrics. To solve this, we propose , which addresses the issue through multi-objective optimization and introduces an effective multi-reward optimization strategy to approximate Pareto optimal. Utilizing batch-wise Pareto optimal selection, automatically identifies the optimal trade-off among different rewards. We use the novel multi-reward optimization algorithm to jointly optimize the T2I model and a prompt expansion network, resulting in significant improvement of image quality and also allow to control the trade-off of different rewards using a reward related prompt during inference. Furthermore, we introduce original prompt-centered guidance at inference time, ensuring fidelity to user input after prompt expansion. Extensive experiments and a user study validate the superiority of over several baselines across various quality criteria, including aesthetics, human preference, text-image alignment, and image sentiment.

Cite

Text

Lee et al. "Parrot: Pareto-Optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72920-1_26

Markdown

[Lee et al. "Parrot: Pareto-Optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/lee2024eccv-parrot/) doi:10.1007/978-3-031-72920-1_26

BibTeX

@inproceedings{lee2024eccv-parrot,
  title     = {{Parrot: Pareto-Optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation}},
  author    = {Lee, Seung Hyun and Li, Yinxiao and Ke, Junjie and Yoo, Innfarn and Zhang, Han and Yu, Jiahui and Wang, Qifei and Deng, Fei and Entis, Glenn and He, Junfeng and Li, Gang and Kim, Sangpil and Essa, Irfan and Yang, Feng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72920-1_26},
  url       = {https://mlanthology.org/eccv/2024/lee2024eccv-parrot/}
}