Multi-Source Deep Learning for Human Pose Estimation

Abstract

Visual appearance score, appearance mixture type and deformation are three important information sources for human pose estimation. This paper proposes to build a multi-source deep model in order to extract non-linear representation from these different aspects of information sources. With the deep model, the global, high-order human body articulation patterns in these information sources are extracted for pose estimation. The task for estimating body locations and the task for human detection are jointly learned using a unified deep model. The proposed approach can be viewed as a post-processing of pose estimation results and can flexibly integrate with existing methods by taking their information sources as input. By extracting the non-linear representation from multiple information sources, the deep model outperforms state-of-the-art by up to 8.6 percent on three public benchmark datasets.

Cite

Text

Ouyang et al. "Multi-Source Deep Learning for Human Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2014. doi:10.1109/CVPR.2014.299

Markdown

[Ouyang et al. "Multi-Source Deep Learning for Human Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2014.](https://mlanthology.org/cvpr/2014/ouyang2014cvpr-multisource/) doi:10.1109/CVPR.2014.299

BibTeX

@inproceedings{ouyang2014cvpr-multisource,
  title     = {{Multi-Source Deep Learning for Human Pose Estimation}},
  author    = {Ouyang, Wanli and Chu, Xiao and Wang, Xiaogang},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2014},
  doi       = {10.1109/CVPR.2014.299},
  url       = {https://mlanthology.org/cvpr/2014/ouyang2014cvpr-multisource/}
}