RAFT-3D: Scene Flow Using Rigid-Motion Embeddings

Abstract

We address the problem of scene flow: given a pair of stereo or RGB-D video frames, estimate pixelwise 3D motion. We introduce RAFT-3D, a new deep architecture for scene flow. RAFT-3D is based on the RAFT model developed for optical flow but iteratively updates a dense field of pixelwise SE3 motion instead of 2D motion. A key innovation of RAFT-3D is rigid-motion embeddings, which represent a soft grouping of pixels into rigid objects. Integral to rigid-motion embeddings is Dense-SE3, a differentiable layer that enforces geometric consistency of the embeddings. Experiments show that RAFT-3D achieves state-of-the-art performance. On FlyingThings3D, under the two-view evaluation, we improved the best published accuracy (delta < 0.05) from 34.3% to 83.7%. On KITTI, we achieve an error of 5.77, outperforming the best published method (6.31), despite using no object instance supervision.

Cite

Text

Teed and Deng. "RAFT-3D: Scene Flow Using Rigid-Motion Embeddings." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00827

Markdown

[Teed and Deng. "RAFT-3D: Scene Flow Using Rigid-Motion Embeddings." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/teed2021cvpr-raft3d/) doi:10.1109/CVPR46437.2021.00827

BibTeX

@inproceedings{teed2021cvpr-raft3d,
  title     = {{RAFT-3D: Scene Flow Using Rigid-Motion Embeddings}},
  author    = {Teed, Zachary and Deng, Jia},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {8375-8384},
  doi       = {10.1109/CVPR46437.2021.00827},
  url       = {https://mlanthology.org/cvpr/2021/teed2021cvpr-raft3d/}
}