Object-Driven Text-to-Image Synthesis via Adversarial Training
Abstract
In this paper, we propose Object-driven Attentive Generative Adversarial Newtorks (Obj-GANs) that allow attention-driven, multi-stage refinement for synthesizing complex images from text descriptions. With a novel object-driven attentive generative network, the Obj-GAN can synthesize salient objects by paying attention to their most relevant words in the text descriptions and their pre-generated class label. In addition, a novel object-wise discriminator based on the Fast R-CNN model is proposed to provide rich object-wise discrimination signals on whether the synthesized object matches the text description and the pre-generated class label. The proposed Obj-GAN significantly outperforms the previous state of the art in various metrics on the large-scale MS-COCO benchmark, increasing the inception score by 27% and decreasing the FID score by 11%. A thorough comparison between the classic grid attention and the new object-driven attention is provided through analyzing their mechanisms and visualizing their attention layers, showing insights of how the proposed model generates complex scenes in high quality.
Cite
Text
Li et al. "Object-Driven Text-to-Image Synthesis via Adversarial Training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01245Markdown
[Li et al. "Object-Driven Text-to-Image Synthesis via Adversarial Training." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/li2019cvpr-objectdriven/) doi:10.1109/CVPR.2019.01245BibTeX
@inproceedings{li2019cvpr-objectdriven,
title = {{Object-Driven Text-to-Image Synthesis via Adversarial Training}},
author = {Li, Wenbo and Zhang, Pengchuan and Zhang, Lei and Huang, Qiuyuan and He, Xiaodong and Lyu, Siwei and Gao, Jianfeng},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2019},
doi = {10.1109/CVPR.2019.01245},
url = {https://mlanthology.org/cvpr/2019/li2019cvpr-objectdriven/}
}