CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction
Abstract
Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for the goal of accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM, based on a scheme that privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction to estimate the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, so to yield semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.
Cite
Text
Tateno et al. "CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.695Markdown
[Tateno et al. "CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/tateno2017cvpr-cnnslam/) doi:10.1109/CVPR.2017.695BibTeX
@inproceedings{tateno2017cvpr-cnnslam,
title = {{CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction}},
author = {Tateno, Keisuke and Tombari, Federico and Laina, Iro and Navab, Nassir},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2017},
doi = {10.1109/CVPR.2017.695},
url = {https://mlanthology.org/cvpr/2017/tateno2017cvpr-cnnslam/}
}