Integrating Grammar and Segmentation for Human Pose Estimation

Abstract

In this paper we present a compositional and-or graph grammar model for human pose estimation. Our model has three distinguishing features: (i) large appearance differences between people are handled compositionally by allowing parts or collections of parts to be substituted with alternative variants, (ii) each variant is a sub-model that can define its own articulated geometry and context-sensitive compatibility with neighboring part variants, and (iii) background region segmentation is incorporated into the part appearance models to better estimate the contrast of a part region from its surroundings, and improve resilience to background clutter. The resulting integrated framework is trained discriminatively in a max-margin framework using an efficient and exact inference algorithm. We present experimental evaluation of our model on two popular datasets, and show performance improvements over the state-of-art on both benchmarks.

Cite

Text

Rothrock et al. "Integrating Grammar and Segmentation for Human Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2013. doi:10.1109/CVPR.2013.413

Markdown

[Rothrock et al. "Integrating Grammar and Segmentation for Human Pose Estimation." Conference on Computer Vision and Pattern Recognition, 2013.](https://mlanthology.org/cvpr/2013/rothrock2013cvpr-integrating/) doi:10.1109/CVPR.2013.413

BibTeX

@inproceedings{rothrock2013cvpr-integrating,
  title     = {{Integrating Grammar and Segmentation for Human Pose Estimation}},
  author    = {Rothrock, Brandon and Park, Seyoung and Zhu, Song-Chun},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2013},
  doi       = {10.1109/CVPR.2013.413},
  url       = {https://mlanthology.org/cvpr/2013/rothrock2013cvpr-integrating/}
}