Harnessing the Conditioning Sensorium for Improved Image Translation

Abstract

Existing methods for multi-modal domain translation learn to embed the input images into a domain-invariant "content" space and a domain-specific "style" space from which novel images can be synthesized. Rather than learning to embed the RGB image from scratch we propose deriving our content representation from conditioning data produced by pretrained off-the-shelf networks. Motivated by the inherent ambiguity of "content", which has different meanings depending on the desired level of abstraction, this approach gives intuitive control over which aspects of content are preserved across domains. We evaluate our method on traditional, well-aligned, datasets such as CelebA-HQ, and propose two novel datasets for evaluation on more complex scenes: ClassicTV and FFHQ-WildCrops. Our approach, which we call Sensorium, enables higher quality domain translation for complex scenes than prior work.

Cite

Text

Nederhood et al. "Harnessing the Conditioning Sensorium for Improved Image Translation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00668

Markdown

[Nederhood et al. "Harnessing the Conditioning Sensorium for Improved Image Translation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/nederhood2021iccv-harnessing/) doi:10.1109/ICCV48922.2021.00668

BibTeX

@inproceedings{nederhood2021iccv-harnessing,
  title     = {{Harnessing the Conditioning Sensorium for Improved Image Translation}},
  author    = {Nederhood, Cooper and Kolkin, Nicholas and Fu, Deqing and Salavon, Jason},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {6752-6761},
  doi       = {10.1109/ICCV48922.2021.00668},
  url       = {https://mlanthology.org/iccv/2021/nederhood2021iccv-harnessing/}
}