Learning Markerless Human Pose Estimation from Multiple Viewpoint Video

Abstract

We present a novel human performance capture technique capable of robustly estimating the pose (articulated joint positions) of a performer observed passively via multiple view-point video (MVV). An affine invariant pose descriptor is learned using a convolutional neural network (CNN) trained over volumetric data extracted from a MVV dataset of diverse human pose and appearance. A manifold embedding is learned via Gaussian Processes for the CNN descriptor and articulated pose spaces enabling regression and so estimation of human pose from MVV input. The learned descriptor and manifold are shown to generalise over a wide range of human poses, providing an efficient performance capture solution that requires no fiducials or other markers to be worn. The system is evaluated against ground truth joint configuration data from a commercial marker-based pose estimation system.

Cite

Text

Trumble et al. "Learning Markerless Human Pose Estimation from Multiple Viewpoint Video." European Conference on Computer Vision Workshops, 2016. doi:10.1007/978-3-319-49409-8_70

Markdown

[Trumble et al. "Learning Markerless Human Pose Estimation from Multiple Viewpoint Video." European Conference on Computer Vision Workshops, 2016.](https://mlanthology.org/eccvw/2016/trumble2016eccvw-learning/) doi:10.1007/978-3-319-49409-8_70

BibTeX

@inproceedings{trumble2016eccvw-learning,
  title     = {{Learning Markerless Human Pose Estimation from Multiple Viewpoint Video}},
  author    = {Trumble, Matthew and Gilbert, Andrew and Hilton, Adrian and Collomosse, John P.},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2016},
  pages     = {871-878},
  doi       = {10.1007/978-3-319-49409-8_70},
  url       = {https://mlanthology.org/eccvw/2016/trumble2016eccvw-learning/}
}