RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis from Prior Knowledge

Abstract

Text-to-image synthesis is a challenging task that generates realistic images from a textual sequence, which usually contains limited information compared with the corresponding image and so is ambiguous and abstractive. The limited textual information only describes a scene partly, which will complicate the generation with complementing the other details implicitly and lead to low-quality images. To address this problem, we propose a novel rich feature generating text-to-image synthesis, called RiFeGAN, to enrich the given description. In order to provide additional visual details and avoid conflicting, RiFeGAN exploits an attention-based caption matching model to select and refine the compatible candidate captions from prior knowledge. Given enriched captions, RiFeGAN uses self-attentional embedding mixtures to extract features across them effectually and handle the diverging features further. Then it exploits multi-captions attentional generative adversarial networks to synthesize images from those features. The experiments conducted on widely-used datasets show that the models can generate images from enriched captions effectually and improve the results significantly.

Cite

Text

Cheng et al. "RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis from Prior Knowledge." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.01092

Markdown

[Cheng et al. "RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis from Prior Knowledge." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/cheng2020cvpr-rifegan/) doi:10.1109/CVPR42600.2020.01092

BibTeX

@inproceedings{cheng2020cvpr-rifegan,
  title     = {{RiFeGAN: Rich Feature Generation for Text-to-Image Synthesis from Prior Knowledge}},
  author    = {Cheng, Jun and Wu, Fuxiang and Tian, Yanling and Wang, Lei and Tao, Dapeng},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.01092},
  url       = {https://mlanthology.org/cvpr/2020/cheng2020cvpr-rifegan/}
}