ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems
Abstract
In this paper we present ActiveStereoNet, the first deep learning solution for active stereo systems. Due to the lack of ground truth, our method is fully self-supervised, yet it produces precise depth with a subpixel precision of 1/30th of a pixel; it does not suffer from the common over-smoothing issues of previous approaches; it preserves the edges; and it explicitly handles occlusions. We introduce a novel reconstruction loss that is more robust to noise and texture-less patches, and is invariant to illumination changes. The proposed loss is optimized using a window-based cost aggregation with an adaptive support weight scheme. This cost aggregation is edge-preserving and smooths the loss function, which is key to allow the network to reach compelling results. Finally we show how the task of predicting invalid regions, such as occlusions, can be trained end-to-end without ground-truth. This component is crucial to reduce blur and particularly improves predictions along depth discontinuities. Extensive quantitatively and qualitatively evaluations on real and synthetic data demonstrate state of the art results in many challenging scenes.
Cite
Text
Zhang et al. "ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01237-3_48Markdown
[Zhang et al. "ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/zhang2018eccv-activestereonet/) doi:10.1007/978-3-030-01237-3_48BibTeX
@inproceedings{zhang2018eccv-activestereonet,
title = {{ActiveStereoNet: End-to-End Self-Supervised Learning for Active Stereo Systems}},
author = {Zhang, Yinda and Khamis, Sameh and Rhemann, Christoph and Valentin, Julien and Kowdle, Adarsh and Tankovich, Vladimir and Schoenberg, Michael and Izadi, Shahram and Funkhouser, Thomas and Fanello, Sean},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2018},
doi = {10.1007/978-3-030-01237-3_48},
url = {https://mlanthology.org/eccv/2018/zhang2018eccv-activestereonet/}
}