Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation

Abstract

Occlusion poses a great threat to monocular multi-person 3D human pose estimation due to large variability in terms of the shape, appearance, and position of occluders. While existing methods try to handle occlusion with pose priors/constraints, data augmentation, or implicit reasoning, they still fail to generalize to unseen poses or occlusion cases and may make large mistakes when multiple people are present. Inspired by the remarkable ability of humans to infer occluded joints from visible cues, we develop a method to explicitly model this process that significantly improves bottom-up multi-person human pose estimation with or without occlusions. First, we split the task into two subtasks: visible keypoints detection and occluded keypoints reasoning, and propose a Deeply Supervised Encoder Distillation (DSED) network to solve the second one. To train our model, we propose a Skeleton-guided human Shape Fitting (SSF) approach to generate pseudo occlusion labels on the existing datasets, enabling explicit occlusion reasoning. Experiments show that explicitly learning from occlusions improves human pose estimation. In addition, exploiting feature-level information of visible joints allows us to reason about occluded joints more accurately. Our method outperforms both the state-of-the-art top-down and bottom-up methods on several benchmarks.

Cite

Text

Liu et al. "Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20065-6_29

Markdown

[Liu et al. "Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/liu2022eccv-explicit/) doi:10.1007/978-3-031-20065-6_29

BibTeX

@inproceedings{liu2022eccv-explicit,
  title     = {{Explicit Occlusion Reasoning for Multi-Person 3D Human Pose Estimation}},
  author    = {Liu, Qihao and Zhang, Yi and Bai, Song and Yuille, Alan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20065-6_29},
  url       = {https://mlanthology.org/eccv/2022/liu2022eccv-explicit/}
}