Two Stream Networks for Self-Supervised Ego-Motion Estimation

Abstract

Learning depth and camera ego-motion from raw unlabeled RGB video streams is seeing exciting progress through self-supervision from strong geometric cues. To leverage not only appearance but also scene geometry, we propose a novel self-supervised two-stream network using RGB and inferred depth information for accurate visual odometry. In addition, we introduce a sparsity-inducing data augmentation policy for ego-motion learning that effectively regularizes the pose network to enable stronger generalization performance. As a result, we show that our proposed two-stream pose network achieves state-of-the-art results among learning-based methods on the KITTI odometry benchmark, and is especially suited for self-supervision at scale. Our experiments on a large-scale urban driving dataset of 1 million frames indicate that the performance of our proposed architecture does indeed scale progressively with more data.

Cite

Text

Ambrus et al. "Two Stream Networks for Self-Supervised Ego-Motion Estimation." Conference on Robot Learning, 2019.

Markdown

[Ambrus et al. "Two Stream Networks for Self-Supervised Ego-Motion Estimation." Conference on Robot Learning, 2019.](https://mlanthology.org/corl/2019/ambrus2019corl-two/)

BibTeX

@inproceedings{ambrus2019corl-two,
  title     = {{Two Stream Networks for Self-Supervised Ego-Motion Estimation}},
  author    = {Ambrus, Rares and Guizilini, Vitor and Li, Jie and Gaidon, Sudeep Pillai Adrien},
  booktitle = {Conference on Robot Learning},
  year      = {2019},
  pages     = {1052-1061},
  volume    = {100},
  url       = {https://mlanthology.org/corl/2019/ambrus2019corl-two/}
}