3D Noise and Adversarial Supervision Is All You Need for Multi-Modal Semantic Image Synthesis
Abstract
Semantic image synthesis models suffer from training instabilities and poor image quality when trained with adversarial supervision alone. Historically, this was alleviated via an additional VGG-based perceptual loss. Hence, we propose a new simplified GAN model, which needs only adversarial supervision to achieve high-quality results. In doing so, we also show that the VGG supervision decreases image diversity and can hurt image quality. We achieve the improvement by re-designing the discriminator as a semantic segmentation network. The resulting stronger supervision makes the VGG loss obsolete. Moreover, in contrast to previous work, we enable high-quality multi-modal image synthesis through a novel noise sampling scheme. Compared to the state of the art, we achieve an average improvement of 6 FID and 7 mIoU.
Cite
Text
Sushko et al. "3D Noise and Adversarial Supervision Is All You Need for Multi-Modal Semantic Image Synthesis." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-65414-6_39Markdown
[Sushko et al. "3D Noise and Adversarial Supervision Is All You Need for Multi-Modal Semantic Image Synthesis." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/sushko2020eccvw-3d/) doi:10.1007/978-3-030-65414-6_39BibTeX
@inproceedings{sushko2020eccvw-3d,
title = {{3D Noise and Adversarial Supervision Is All You Need for Multi-Modal Semantic Image Synthesis}},
author = {Sushko, Vadim and Schönfeld, Edgar and Zhang, Dan and Gall, Jürgen and Schiele, Bernt and Khoreva, Anna},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {554-558},
doi = {10.1007/978-3-030-65414-6_39},
url = {https://mlanthology.org/eccvw/2020/sushko2020eccvw-3d/}
}