Structure and Motion from Casual Videos

Abstract

Casual videos, such as those captured in daily life using a hand-held cell phone, pose problems for conventional structure-from-motion (SfM) techniques: the camera is often roughly stationary (not much parallax), and a large portion of the video may contain moving objects. Under such conditions, state-of-the-art SfM methods tend to produce erroneous results, often failing entirely. To address these issues, we propose CasualSAM, a method to estimate camera poses and dense depth maps from a monocular, casually-captured video. Like conventional SfM, our method performs a joint optimization over 3D structure and camera poses, but uses a pretrained depth prediction network to represent 3D structure rather than sparse keypoints. In contrast to previous approaches, our method does not assume motion is rigid or determined by semantic segmentation, instead optimizing for a per-pixel motion map based on reprojection error. Our method sets a new state-of-the-art for pose and depth estimation on the Sintel dataset, and produces high-quality results for the DAVIS dataset where most prior methods fail to produce usable camera poses.

Cite

Text

Zhang et al. "Structure and Motion from Casual Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19827-4_2

Markdown

[Zhang et al. "Structure and Motion from Casual Videos." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/zhang2022eccv-structure/) doi:10.1007/978-3-031-19827-4_2

BibTeX

@inproceedings{zhang2022eccv-structure,
  title     = {{Structure and Motion from Casual Videos}},
  author    = {Zhang, Zhoutong and Cole, Forrester and Li, Zhengqi and Snavely, Noah and Rubinstein, Michael and Freeman, William T.},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19827-4_2},
  url       = {https://mlanthology.org/eccv/2022/zhang2022eccv-structure/}
}