CookGAN: Causality Based Text-to-Image Synthesis
Abstract
This paper addresses the problem of text-to-image synthesis from a new perspective, i.e., the cause-and-effect chain in image generation. Causality is a common phenomenon in cooking. The dish appearance changes depending on the cooking actions and ingredients. The challenge of synthesis is that a generated image should depict the visual result of action-on-object. This paper presents a new network architecture, CookGAN, that mimics visual effect in causality chain, preserves fine-grained details and progressively upsamples image. Particularly, a cooking simulator sub-network is proposed to incrementally make changes to food images based on the interaction between ingredients and cooking methods over a series of steps. Experiments on Recipe1M verify that CookGAN manages to generate food images with reasonably impressive inception score. Furthermore, the images are semantically interpretable and manipulable.
Cite
Text
Zhu and Ngo. "CookGAN: Causality Based Text-to-Image Synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00556Markdown
[Zhu and Ngo. "CookGAN: Causality Based Text-to-Image Synthesis." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/zhu2020cvpr-cookgan/) doi:10.1109/CVPR42600.2020.00556BibTeX
@inproceedings{zhu2020cvpr-cookgan,
title = {{CookGAN: Causality Based Text-to-Image Synthesis}},
author = {Zhu, Bin and Ngo, Chong-Wah},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.00556},
url = {https://mlanthology.org/cvpr/2020/zhu2020cvpr-cookgan/}
}