Pose Proposal Networks

Abstract

We propose a novel method to detect an unknown number of articulated 2D poses in real time. To decouple the runtime complexity of pixel-wise body part detectors from their convolutional neural network (CNN) feature map resolutions, our approach, called pose proposal networks, introduces a state-of-the-art single-shot object detection paradigm using grid-wise image feature maps in a bottom-up pose detection scenario. Body part proposals, which are represented as region proposals, and limbs are detected directly via a single-shot CNN. Specialized to such detections, a bottom-up greedy parsing step is probabilistically redesigned to take into account the global context. Experimental results on the MPII Multi-Person benchmark confirm that our method achieves 72.8% mAP comparable to state-of-the-art bottom-up approaches while its total runtime using a GeForce GTX1080Ti card reaches up to 5.6 ms (180 FPS), which exceeds the bottleneck runtimes that are observed in state-of-the-art approaches.

Cite

Text

Sekii and Taiki. "Pose Proposal Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01261-8_21

Markdown

[Sekii and Taiki. "Pose Proposal Networks." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/sekii2018eccv-pose/) doi:10.1007/978-3-030-01261-8_21

BibTeX

@inproceedings{sekii2018eccv-pose,
  title     = {{Pose Proposal Networks}},
  author    = {Sekii,  and Taiki, },
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01261-8_21},
  url       = {https://mlanthology.org/eccv/2018/sekii2018eccv-pose/}
}