CookGAN: Causality Based Text-to-Image Synthesis

Abstract

This paper addresses the problem of text-to-image synthesis from a new perspective, i.e., the cause-and-effect chain in image generation. Causality is a common phenomenon in cooking. The dish appearance changes depending on the cooking actions and ingredients. The challenge of synthesis is that a generated image should depict the visual result of action-on-object. This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image. Particularly, a cooking simulator sub-network is proposed to incrementally make changes to food images based on the interaction between ingredients and cooking methods over a series of steps. Experiments on Recipe1M verify that CookGAN manages to generate food images with reasonably impressive inception score. Furthermore, the images are semantically interpretable and manipulable.

Cite

Text

Zhu and Ngo. "CookGAN: Causality Based Text-to-Image Synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00556

Markdown

[Zhu and Ngo. "CookGAN: Causality Based Text-to-Image Synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/zhu2020cvpr-cookgan/) doi:10.1109/CVPR42600.2020.00556

BibTeX

@inproceedings{zhu2020cvpr-cookgan,
  title     = {{CookGAN: Causality Based Text-to-Image Synthesis}},
  author    = {Zhu, Bin and Ngo, Chong-Wah},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00556},
  url       = {https://mlanthology.org/cvpr/2020/zhu2020cvpr-cookgan/}
}