How to Improve CNN-Based 6-DoF Camera Pose Estimation
Abstract
Convolutional neural networks (CNNs) and transfer learning have recently been used for 6 degrees of freedom (6-DoF) camera pose estimation. While they do not reach the same accuracy as visual SLAM-based approaches and are restricted to a specific environment, they excel in robustness and can be applied even to a single image. In this paper, we study PoseNet [1] and investigate modifications based on datasets' characteristics to improve the accuracy of the pose estimates. In particular, we emphasize the importance of field-of-view over image resolution; we present a data augmentation scheme to reduce overfitting; we study the effect of Long-Short-Term-Memory (LSTM) cells. Lastly, we combine these modifications and improve PoseNet's performance for monocular CNN based camera pose regression.
Cite
Text
Seifi and Tuytelaars. "How to Improve CNN-Based 6-DoF Camera Pose Estimation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00471Markdown
[Seifi and Tuytelaars. "How to Improve CNN-Based 6-DoF Camera Pose Estimation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/seifi2019iccvw-improve/) doi:10.1109/ICCVW.2019.00471BibTeX
@inproceedings{seifi2019iccvw-improve,
title = {{How to Improve CNN-Based 6-DoF Camera Pose Estimation}},
author = {Seifi, Soroush and Tuytelaars, Tinne},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {3788-3795},
doi = {10.1109/ICCVW.2019.00471},
url = {https://mlanthology.org/iccvw/2019/seifi2019iccvw-improve/}
}