DUNIT: Detection-Based Unsupervised Image-to-Image Translation

Abstract

Image-to-image translation has made great strides in recent years, with current techniques being able to handle unpaired training images and to account for the multi-modality of the translation problem. Despite this, most methods treat the image as a whole, which makes the results they produce for content-rich scenes less realistic. In this paper, we introduce a Detection-based Unsupervised Image-to-image Translation (DUNIT) approach that explicitly accounts for the object instances in the translation process. To this end, we extract separate representations for the global image and for the instances, which we then fuse into a common representation from which we generate the translated image. This allows us to preserve the detailed content of object instances, while still modeling the fact that we aim to produce an image of a single consistent scene. We introduce an instance consistency loss to maintain the coherence between the detections. Furthermore, by incorporating a detector into our architecture, we can still exploit object instances at test time. As evidenced by our experiments, this allows us to outperform the state-of-the-art unsupervised image-to-image translation methods. Furthermore, our approach can also be used as an unsupervised domain adaptation strategy for object detection, and it also achieves state-of-the-art performance on this task.

Cite

Text

Bhattacharjee et al. "DUNIT: Detection-Based Unsupervised Image-to-Image Translation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00484

Markdown

[Bhattacharjee et al. "DUNIT: Detection-Based Unsupervised Image-to-Image Translation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/bhattacharjee2020cvpr-dunit/) doi:10.1109/CVPR42600.2020.00484

BibTeX

@inproceedings{bhattacharjee2020cvpr-dunit,
  title     = {{DUNIT: Detection-Based Unsupervised Image-to-Image Translation}},
  author    = {Bhattacharjee, Deblina and Kim, Seungryong and Vizier, Guillaume and Salzmann, Mathieu},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00484},
  url       = {https://mlanthology.org/cvpr/2020/bhattacharjee2020cvpr-dunit/}
}