CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction

Abstract

Given the recent advances in depth prediction from Convolutional Neural Networks (CNNs), this paper investigates how predicted depth maps from a deep neural network can be deployed for the goal of accurate and dense monocular reconstruction. We propose a method where CNN-predicted dense depth maps are naturally fused together with depth measurements obtained from direct monocular SLAM, based on a scheme that privileges depth prediction in image locations where monocular SLAM approaches tend to fail, e.g. along low-textured regions, and vice-versa. We demonstrate the use of depth prediction to estimate the absolute scale of the reconstruction, hence overcoming one of the major limitations of monocular SLAM. Finally, we propose a framework to efficiently fuse semantic labels, obtained from a single frame, with dense SLAM, so to yield semantically coherent scene reconstruction from a single view. Evaluation results on two benchmark datasets show the robustness and accuracy of our approach.

Cite

Text

Tateno et al. "CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.695

Markdown

[Tateno et al. "CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/tateno2017cvpr-cnnslam/) doi:10.1109/CVPR.2017.695

BibTeX

@inproceedings{tateno2017cvpr-cnnslam,
  title     = {{CNN-SLAM: Real-Time Dense Monocular SLAM with Learned Depth Prediction}},
  author    = {Tateno, Keisuke and Tombari, Federico and Laina, Iro and Navab, Nassir},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.695},
  url       = {https://mlanthology.org/cvpr/2017/tateno2017cvpr-cnnslam/}
}