15 Keypoints Is All You Need
Abstract
Pose-tracking is an important problem that requires identifying unique human pose-instances and matching them temporally across different frames in a video. However, existing pose-tracking methods are unable to accurately model temporal relationships and require significant computation, often computing the tracks offline. We present an efficient multi-person pose-tracking method, KeyTrack that only relies on keypoint information without using any RGB or optical flow to locate and track human keypoints in real-time. KeyTrack is a top-down approach that learns spatio-temporal pose relationships by modeling the multi-person pose-tracking problem as a novel Pose Entailment task using a Transformer based architecture. Furthermore, KeyTrack uses a novel, parameter-free, keypoint refinement technique that improves the keypoint estimates used by the Transformers. We achieve state-of-the-art results on PoseTrack'17 and PoseTrack'18 benchmarks while using only a fraction of the computation used by most other methods for computing the tracking information.
Cite
Text
Snower et al. "15 Keypoints Is All You Need." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00677Markdown
[Snower et al. "15 Keypoints Is All You Need." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/snower2020cvpr-keypoints/) doi:10.1109/CVPR42600.2020.00677BibTeX
@inproceedings{snower2020cvpr-keypoints,
title = {{15 Keypoints Is All You Need}},
author = {Snower, Michael and Kadav, Asim and Lai, Farley and Graf, Hans Peter},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2020},
doi = {10.1109/CVPR42600.2020.00677},
url = {https://mlanthology.org/cvpr/2020/snower2020cvpr-keypoints/}
}