A Top-Down Approach to Articulated Human Pose Estimation and Tracking

Abstract

Both the tasks of multi-person human pose estimation and pose tracking in videos are quite challenging. Existing methods can be categorized into two groups: top-down and bottom-up approaches. In this paper, following the top-down approach, we aim to build a strong baseline system with three modules: human candidate detector, single-person pose estimator and human pose tracker. Firstly, we choose a generic object detector among state-of-the-art methods to detect human candidates. Then, cascaded pyramid network is used to estimate the corresponding human pose. Finally, we use a flow-based pose tracker to render keypoint-association across frames, i.e., assigning each human candidate a unique and temporally-consistent id, for the multi-target pose tracking purpose. We conduct extensive ablative experiments to validate various choices of models and configurations. We take part in two ECCV’18 PoseTrack challenges (https://posetrack.net/workshops/eccv2018/posetrack_eccv_2018_results.html): pose estimation and pose tracking.

Cite

Text

Ning et al. "A Top-Down Approach to Articulated Human Pose Estimation and Tracking." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11012-3_20

Markdown

[Ning et al. "A Top-Down Approach to Articulated Human Pose Estimation and Tracking." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/ning2018eccvw-topdown/) doi:10.1007/978-3-030-11012-3_20

BibTeX

@inproceedings{ning2018eccvw-topdown,
  title     = {{A Top-Down Approach to Articulated Human Pose Estimation and Tracking}},
  author    = {Ning, Guanghan and Liu, Ping and Fan, Xiaochuan and Zhang, Chi},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {227-234},
  doi       = {10.1007/978-3-030-11012-3_20},
  url       = {https://mlanthology.org/eccvw/2018/ning2018eccvw-topdown/}
}