Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion

Abstract

We present Video Autoencoder for learning disentangled representations of 3D structure and camera pose from videos in a self-supervised manner. Relying on temporal continuity in videos, our work assumes that the 3D scene structure in nearby video frames remains static. Given a sequence of video frames as input, the Video Autoencoder extracts a disentangled representation of the scene including: (i) a temporally-consistent deep voxel feature to represent the 3D structure and (ii) a 3D trajectory of camera poses for each frame. These two representations will then be re-entangled for rendering the input video frames. Video Autoencoder can be trained directly using a pixel reconstruction loss, without any ground truth 3D or camera pose annotations. The disentangled representation can be applied to a range of tasks, including novel view synthesis, camera pose estimation, and video generation by motion following. We evaluate our method on several large-scale natural video datasets, and show generalization results on out-of-domain images.

Cite

Text

Lai et al. "Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00959

Markdown

[Lai et al. "Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/lai2021iccv-video/) doi:10.1109/ICCV48922.2021.00959

BibTeX

@inproceedings{lai2021iccv-video,
  title     = {{Video Autoencoder: Self-Supervised Disentanglement of Static 3D Structure and Motion}},
  author    = {Lai, Zihang and Liu, Sifei and Efros, Alexei A. and Wang, Xiaolong},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {9730-9740},
  doi       = {10.1109/ICCV48922.2021.00959},
  url       = {https://mlanthology.org/iccv/2021/lai2021iccv-video/}
}