15 Keypoints Is All You Need

Abstract

Pose-tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames in a video. However, existing pose-tracking methods are unable to accurately model temporal relationships and require significant computation, often computing the tracks offline. We present an efficient multi-person pose-tracking method, KeyTrack that only relies on keypoint information without using any RGB or optical flow to locate and track human keypoints in real-time. KeyTrack is a top-down approach that learns spatio-temporal pose relationships by modeling the multi-person pose-tracking problem as a novel Pose Entailment task using a Transformer based architecture. Furthermore, KeyTrack uses a novel, parameter-free, keypoint refinement technique that improves the keypoint estimates used by the Transformers. We achieve state-of-the-art results on PoseTrack'17 and PoseTrack'18 benchmarks while using only a fraction of the computation used by most other methods for computing the tracking information.

Cite

Text

Snower et al. "15 Keypoints Is All You Need." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00677

Markdown

[Snower et al. "15 Keypoints Is All You Need." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/snower2020cvpr-keypoints/) doi:10.1109/CVPR42600.2020.00677

BibTeX

@inproceedings{snower2020cvpr-keypoints,
  title     = {{15 Keypoints Is All You Need}},
  author    = {Snower, Michael and Kadav, Asim and Lai, Farley and Graf, Hans Peter},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00677},
  url       = {https://mlanthology.org/cvpr/2020/snower2020cvpr-keypoints/}
}