Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Abstract
We present EgoTAP a heatmap-to-3D pose lifting method for highly accurate stereo egocentric 3D pose estimation. Severe self-occlusion and out-of-view limbs in egocentric camera views make accurate pose estimation a challenging problem. To address the challenge prior methods employ joint heatmaps-probabilistic 2D representations of the body pose but heatmap-to-3D pose conversion still remains an inaccurate process. We propose a novel heatmap-to-3D lifting method composed of the Grid ViT Encoder and the Propagation Network. The Grid ViT Encoder summarizes joint heatmaps into effective feature embedding using self-attention. Then the Propagation Network estimates the 3D pose by utilizing skeletal information to better estimate the position of obscure joints. Our method significantly outperforms the previous state-of-the-art qualitatively and quantitatively demonstrated by a 23.9% reduction of error in an MPJPE metric. Our source code is available on GitHub.
Cite
Text
Kang and Lee. "Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00086Markdown
[Kang and Lee. "Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/kang2024cvpr-attentionpropagation/) doi:10.1109/CVPR52733.2024.00086BibTeX
@inproceedings{kang2024cvpr-attentionpropagation,
title = {{Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting}},
author = {Kang, Taeho and Lee, Youngki},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {842-851},
doi = {10.1109/CVPR52733.2024.00086},
url = {https://mlanthology.org/cvpr/2024/kang2024cvpr-attentionpropagation/}
}