Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows

Abstract

Monocular 3D human pose and shape estimation is challenging due to the many degrees of freedom of the human body and the difficulty to acquire training data for large-scale supervised learning in complex visual scenes where humans with diverse shape and appearance, appear against complex backgrounds,in a variety of poses, and are partially occluded, or involved in interactions. Essential to learning is leveraging effective 3D human priors, and the ability to work under weak supervision, at scale, by exploiting, to the largest extent, the detailed human body semantics in images. In this paper we present new priors as well as large-scale weakly supervised models for 3D human pose and shape estimation. Key to our formulation are new latent normalizing flow representations, as well as fully differentiable, structurally-sensitive, semantic body part alignment(re-projection) loss functions that ensure consistent estimates and sharp feedback signals for learning. In extensive experiments using both motion capture datasets like CMU, Human3.6M, 3DPW, or AMASS, and repositories like COCO, we show that our proposed methods outperform existing counterparts, supporting the construction of an increasingly more accurate family of models based on large-scale training with unlabeled image data.

Cite

Text

Zanfir et al. "Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58539-6_28

Markdown

[Zanfir et al. "Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/zanfir2020eccv-weakly/) doi:10.1007/978-3-030-58539-6_28

BibTeX

@inproceedings{zanfir2020eccv-weakly,
  title     = {{Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows}},
  author    = {Zanfir, Andrei and Bazavan, Eduard Gabriel and Xu, Hongyi and Freeman, William T. and Sukthankar, Rahul and Sminchisescu, Cristian},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58539-6_28},
  url       = {https://mlanthology.org/eccv/2020/zanfir2020eccv-weakly/}
}