Harnessing the Conditioning Sensorium for Improved Image Translation
Abstract
Existing methods for multi-modal domain translation learn to embed the input images into a domain-invariant "content" space and a domain-specific "style" space from which novel images can be synthesized. Rather than learning to embed the RGB image from scratch we propose deriving our content representation from conditioning data produced by pretrained off-the-shelf networks. Motivated by the inherent ambiguity of "content", which has different meanings depending on the desired level of abstraction, this approach gives intuitive control over which aspects of content are preserved across domains. We evaluate our method on traditional, well-aligned, datasets such as CelebA-HQ, and propose two novel datasets for evaluation on more complex scenes: ClassicTV and FFHQ-WildCrops. Our approach, which we call Sensorium, enables higher quality domain translation for complex scenes than prior work.
Cite
Text
Nederhood et al. "Harnessing the Conditioning Sensorium for Improved Image Translation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00668Markdown
[Nederhood et al. "Harnessing the Conditioning Sensorium for Improved Image Translation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/nederhood2021iccv-harnessing/) doi:10.1109/ICCV48922.2021.00668BibTeX
@inproceedings{nederhood2021iccv-harnessing,
title = {{Harnessing the Conditioning Sensorium for Improved Image Translation}},
author = {Nederhood, Cooper and Kolkin, Nicholas and Fu, Deqing and Salavon, Jason},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {6752-6761},
doi = {10.1109/ICCV48922.2021.00668},
url = {https://mlanthology.org/iccv/2021/nederhood2021iccv-harnessing/}
}