Reconstructing Animatable Categories from Videos

Abstract

Building animatable 3D models is challenging due to the need for 3D scans, laborious registration, and manual rigging. Recently, differentiable rendering provides a pathway to obtain high-quality 3D models from monocular videos, but these are limited to rigid categories or single instances. We present RAC, a method to build category-level 3D models from monocular videos, disentangling variations over instances and motion over time. Three key ideas are introduced to solve this problem: (1) specializing a category-level skeleton to instances, (2) a method for latent space regularization that encourages shared structure across a category while maintaining instance details, and (3) using 3D background models to disentangle objects from the background. We build 3D models for humans, cats, and dogs given monocular videos. Project page: gengshan-y.github.io/rac-www/

Cite

Text

Yang et al. "Reconstructing Animatable Categories from Videos." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01630

Markdown

[Yang et al. "Reconstructing Animatable Categories from Videos." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/yang2023cvpr-reconstructing/) doi:10.1109/CVPR52729.2023.01630

BibTeX

@inproceedings{yang2023cvpr-reconstructing,
  title     = {{Reconstructing Animatable Categories from Videos}},
  author    = {Yang, Gengshan and Wang, Chaoyang and Reddy, N. Dinesh and Ramanan, Deva},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {16995-17005},
  doi       = {10.1109/CVPR52729.2023.01630},
  url       = {https://mlanthology.org/cvpr/2023/yang2023cvpr-reconstructing/}
}