UniPose: Unified Human Pose Estimation in Single Images and Videos

Abstract

We propose UniPose, a unified framework for human pose estimation, based on our "Waterfall" Atrous Spatial Pooling architecture, that achieves state-of-art-results on several pose estimation metrics. UniPose incorporates contextual segmentation and joint localization to estimate the human pose in a single stage, with high accuracy, without relying on statistical postprocessing methods. The Waterfall module in UniPose leverages the efficiency of progressive filtering in the cascade architecture, while maintaining multi-scale fields-of-view comparable to spatial pyramid configurations. Additionally, our method is extended to UniPose-LSTM for multi-frame processing and achieves state-of-the-art results for temporal pose estimation in Video. Our results on multiple datasets demonstrate that UniPose, with a ResNet backbone and Waterfall module, is a robust and efficient architecture for pose estimation obtaining state-of-the-art results in single person pose detection for both single images and videos.

Cite

Text

Artacho and Savakis. "UniPose: Unified Human Pose Estimation in Single Images and Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00706

Markdown

[Artacho and Savakis. "UniPose: Unified Human Pose Estimation in Single Images and Videos." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/artacho2020cvpr-unipose/) doi:10.1109/CVPR42600.2020.00706

BibTeX

@inproceedings{artacho2020cvpr-unipose,
  title     = {{UniPose: Unified Human Pose Estimation in Single Images and Videos}},
  author    = {Artacho, Bruno and Savakis, Andreas},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00706},
  url       = {https://mlanthology.org/cvpr/2020/artacho2020cvpr-unipose/}
}