T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks

Abstract

Current methods for single-image depth estimation use training datasets with real image-depth pairs or stereo pairs, which are not easy to acquire. We propose a framework, trained on synthetic image-depth pairs and unpaired real images, that comprises an image translation network for enhancing realism of input images, followed by a depth prediction network. A key idea is having the first network act as a wide-spectrum input translator, taking in either synthetic or real images, and ideally producing minimally modified realistic images. This is done via a reconstruction loss when the training input is real, and a GAN loss when synthetic, removing the need for heuristic self-regularization. The second network is trained on a task loss for synthetic image-depth pairs, with an extra GAN loss to unify real and synthetic feature distributions. Importantly, the framework can be trained end-to-end, leading to good results, even surpassing early deep-learning methods that use real paired data.

Cite

Text

Zheng et al. "T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks." Proceedings of the European Conference on Computer Vision (ECCV), 2018.

Markdown

[Zheng et al. "T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/zheng2018eccv-t2net/)

BibTeX

@inproceedings{zheng2018eccv-t2net,
  title     = {{T2Net: Synthetic-to-Realistic Translation for Solving Single-Image Depth Estimation Tasks}},
  author    = {Zheng, Chuanxia and Cham, Tat-Jen and Cai, Jianfei},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  url       = {https://mlanthology.org/eccv/2018/zheng2018eccv-t2net/}
}