HigherHRNet: Scale-Aware Representation Learning for Bottom-up Human Pose Estimation

Abstract

Bottom-up human pose estimation methods have difficulties in predicting the correct pose for small persons due to challenges in scale variation. In this paper, we present HigherHRNet: a novel bottom-up human pose estimation method for learning scale-aware representations using high-resolution feature pyramids. Equipped with multi-resolution supervision for training and multi-resolution aggregation for inference, the proposed approach is able to solve the scale variation challenge in bottom-up multi-person pose estimation and localize keypoints more precisely, especially for small person. The feature pyramid in HigherHRNet consists of feature map outputs from HRNet and upsampled higher-resolution outputs through a transposed convolution. HigherHRNet outperforms the previous best bottom-up method by 2.5% AP for medium person on COCO test-dev, showing its effectiveness in handling scale variation. Furthermore, HigherHRNet achieves new state-of-the-art result on COCO test-dev (70.5% AP) without using refinement or other post-processing techniques, surpassing all existing bottom-up methods. HigherHRNet even surpasses all top-down methods on CrowdPose test (67.6% AP), suggesting its robustness in crowded scene.

Cite

Text

Cheng et al. "HigherHRNet: Scale-Aware Representation Learning for Bottom-up Human Pose Estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00543

Markdown

[Cheng et al. "HigherHRNet: Scale-Aware Representation Learning for Bottom-up Human Pose Estimation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/cheng2020cvpr-higherhrnet/) doi:10.1109/CVPR42600.2020.00543

BibTeX

@inproceedings{cheng2020cvpr-higherhrnet,
  title     = {{HigherHRNet: Scale-Aware Representation Learning for Bottom-up Human Pose Estimation}},
  author    = {Cheng, Bowen and Xiao, Bin and Wang, Jingdong and Shi, Honghui and Huang, Thomas S. and Zhang, Lei},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00543},
  url       = {https://mlanthology.org/cvpr/2020/cheng2020cvpr-higherhrnet/}
}