The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation

Abstract

We introduce CenterGroup, an attention-based framework to estimate human poses from a set of identity-agnostic keypoints and person center predictions in an image. Our approach uses a transformer to obtain context-aware embeddings for all detected keypoints and centers and then applies multi-head attention to directly group joints into their corresponding person centers. While most bottom-up methods rely on non-learnable clustering at inference, CenterGroup uses a fully differentiable attention mechanism that we train end-to-end together with our keypoint detector. As a result, our method obtains state-of-the-art performance with up to 2.5x faster inference time than competing bottom-up methods.

Cite

Text

Brasó et al. "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01164

Markdown

[Brasó et al. "The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/braso2021iccv-center/) doi:10.1109/ICCV48922.2021.01164

BibTeX

@inproceedings{braso2021iccv-center,
  title     = {{The Center of Attention: Center-Keypoint Grouping via Attention for Multi-Person Pose Estimation}},
  author    = {Brasó, Guillem and Kister, Nikita and Leal-Taixé, Laura},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {11853-11863},
  doi       = {10.1109/ICCV48922.2021.01164},
  url       = {https://mlanthology.org/iccv/2021/braso2021iccv-center/}
}