The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation
Abstract
We introduce CenterGroup, an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image. Our approach uses a transformer to obtain context-aware embeddings for all detected keypoints and centers and then applies multi-head attention to directly group joints into their corresponding person centers. While most bottom-up methods rely on non-learnable clustering at inference, CenterGroup uses a fully differentiable attention mechanism that we train end-to-end together with our keypoint detector. As a result, our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up methods.
Cite
Text
Brasó et al. "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01164Markdown
[Brasó et al. "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/braso2021iccv-center/) doi:10.1109/ICCV48922.2021.01164BibTeX
@inproceedings{braso2021iccv-center,
title = {{The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation}},
author = {Brasó, Guillem and Kister, Nikita and Leal-Taixé, Laura},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {11853-11863},
doi = {10.1109/ICCV48922.2021.01164},
url = {https://mlanthology.org/iccv/2021/braso2021iccv-center/}
}