SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation
Abstract
High-resolution representation is essential for achieving good performance in human pose estimation models. To obtain such features, existing works utilize high-resolution input images or fine-grained image tokens. However, this dense high-resolution representation brings a significant computational burden. In this paper, we address the following question: "Only sparse human keypoint locations are detected for human pose estimation, is it really necessary to describe the whole image in a dense, high-resolution manner?" Based on dynamic transformer models, we propose a framework that only uses Sparse High-resolution Representations for human Pose estimation (SHaRPose). In detail, SHaRPose consists of two stages. At the coarse stage, the relations between image regions and keypoints are dynamically mined while a coarse estimation is generated. Then, a quality predictor is applied to decide whether the coarse estimation results should be refined. At the fine stage, SHaRPose builds sparse high-resolution representations only on the regions related to the keypoints and provides refined high-precision human pose estimations. Extensive experiments demonstrate the outstanding performance of the proposed method. Specifically, compared to the state-of-the-art method ViTPose, our model SHaRPose-Base achieves 77.4 AP (+0.5 AP) on the COCO validation set and 76.7 AP (+0.5 AP) on the COCO test-dev set, and infers at a speed of 1.4x faster than ViTPose-Base. Code is available at https://github.com/AnxQ/sharpose.
Cite
Text
An et al. "SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I2.27826Markdown
[An et al. "SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/an2024aaai-sharpose/) doi:10.1609/AAAI.V38I2.27826BibTeX
@inproceedings{an2024aaai-sharpose,
title = {{SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation}},
author = {An, Xiaoqi and Zhao, Lin and Gong, Chen and Wang, Nannan and Wang, Di and Yang, Jian},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {691-699},
doi = {10.1609/AAAI.V38I2.27826},
url = {https://mlanthology.org/aaai/2024/an2024aaai-sharpose/}
}