Unsupervised Volumetric Animation

Abstract

We propose a novel approach for unsupervised 3D animation of non-rigid deformable objects. Our method learns the 3D structure and dynamics of objects solely from single-view RGB videos, and can decompose them into semantically meaningful parts that can be tracked and animated. Using a 3D autodecoder framework, paired with a keypoint estimator via a differentiable PnP algorithm, our model learns the underlying object geometry and parts decomposition in an entirely unsupervised manner. This allows it to perform 3D segmentation, 3D keypoint estimation, novel view synthesis, and animation. We primarily evaluate the framework on two video datasets: VoxCeleb 256^2 and TEDXPeople 256^2. In addition, on the Cats 256^2 dataset, we show that it learns compelling 3D geometry even from raw image data. Finally, we show that our model can obtain animatable 3D objects from a singe or a few images.

Cite

Text

Siarohin et al. "Unsupervised Volumetric Animation." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00452

Markdown

[Siarohin et al. "Unsupervised Volumetric Animation." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/siarohin2023cvpr-unsupervised/) doi:10.1109/CVPR52729.2023.00452

BibTeX

@inproceedings{siarohin2023cvpr-unsupervised,
  title     = {{Unsupervised Volumetric Animation}},
  author    = {Siarohin, Aliaksandr and Menapace, Willi and Skorokhodov, Ivan and Olszewski, Kyle and Ren, Jian and Lee, Hsin-Ying and Chai, Menglei and Tulyakov, Sergey},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {4658-4669},
  doi       = {10.1109/CVPR52729.2023.00452},
  url       = {https://mlanthology.org/cvpr/2023/siarohin2023cvpr-unsupervised/}
}