Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation
Abstract
Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach does not, however, generalize to non-planar 3D scenes with illumination variations commonly seen in real-world videos. In this work, we propose self-supervised learning depth-aware keypoints from unlabeled videos directly. We jointly learn keypoint and depth estimation networks by combining appearance and geometric matching via a differentiable structure-from-motion module based on Procrustean residual pose correction. We show how our self-supervised keypoints can be trivially incorporated into state-of-the-art visual odometry frameworks for robust and accurate ego-motion estimation of autonomous vehicles in real-world conditions.
Cite
Text
Tang et al. "Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation." Conference on Robot Learning, 2020.Markdown
[Tang et al. "Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation." Conference on Robot Learning, 2020.](https://mlanthology.org/corl/2020/tang2020corl-selfsupervised/)BibTeX
@inproceedings{tang2020corl-selfsupervised,
title = {{Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation}},
author = {Tang, Jiexiong and Ambrus, Rares and Guizilini, Vitor and Pillai, Sudeep and Kim, Hanme and Jensfelt, Patric and Gaidon, Adrien},
booktitle = {Conference on Robot Learning},
year = {2020},
pages = {2085-2103},
volume = {155},
url = {https://mlanthology.org/corl/2020/tang2020corl-selfsupervised/}
}