Towards Good Practice for CNN-Based Monocular Depth Estimation
Abstract
Monocular depth estimation has gained increasing attention in recent years, and various techniques have been proposed to tackle this problem. In this work, we aim to provide a comprehensive study on the techniques widely used in monocular depth estimation, and examine their individual influence on the performance. More specifically, we provide a study on: 1) network architectures, including different combinations of encoders/decoders. 2) supervision losses, including fully supervised losses and self-supervised losses and 3) other practices such as input resolution. The experiments are conducted on two commonly used public datasets, KITTI and NYU Depth v2. We also provide an analysis on the errors produced by different models, to reveal the limitations of current methods. Furthermore, by a careful redesign, we present a model for depth estimation, which achieves competitive performance on KITTI and state-of-the-art performance on NYU Depth v2. Our code is publicly available at https://github.com/zenithfang/supervised_dispnet.
Cite
Text
Fang et al. "Towards Good Practice for CNN-Based Monocular Depth Estimation." Winter Conference on Applications of Computer Vision, 2020.Markdown
[Fang et al. "Towards Good Practice for CNN-Based Monocular Depth Estimation." Winter Conference on Applications of Computer Vision, 2020.](https://mlanthology.org/wacv/2020/fang2020wacv-good/)BibTeX
@inproceedings{fang2020wacv-good,
title = {{Towards Good Practice for CNN-Based Monocular Depth Estimation}},
author = {Fang, Zhicheng and Chen, Xiaoran and Chen, Yuhua and Van Gool, Luc},
booktitle = {Winter Conference on Applications of Computer Vision},
year = {2020},
url = {https://mlanthology.org/wacv/2020/fang2020wacv-good/}
}