How to Improve CNN-Based 6-DoF Camera Pose Estimation

Abstract

Convolutional neural networks (CNNs) and transfer learning have recently been used for 6 degrees of freedom (6-DoF) camera pose estimation. While they do not reach the same accuracy as visual SLAM-based approaches and are restricted to a specific environment, they excel in robustness and can be applied even to a single image. In this paper, we study PoseNet [1] and investigate modifications based on datasets' characteristics to improve the accuracy of the pose estimates. In particular, we emphasize the importance of field-of-view over image resolution; we present a data augmentation scheme to reduce overfitting; we study the effect of Long-Short-Term-Memory (LSTM) cells. Lastly, we combine these modifications and improve PoseNet's performance for monocular CNN based camera pose regression.

Cite

Text

Seifi and Tuytelaars. "How to Improve CNN-Based 6-DoF Camera Pose Estimation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00471

Markdown

[Seifi and Tuytelaars. "How to Improve CNN-Based 6-DoF Camera Pose Estimation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/seifi2019iccvw-improve/) doi:10.1109/ICCVW.2019.00471

BibTeX

@inproceedings{seifi2019iccvw-improve,
  title     = {{How to Improve CNN-Based 6-DoF Camera Pose Estimation}},
  author    = {Seifi, Soroush and Tuytelaars, Tinne},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {3788-3795},
  doi       = {10.1109/ICCVW.2019.00471},
  url       = {https://mlanthology.org/iccvw/2019/seifi2019iccvw-improve/}
}