Pose and Joint-Aware Action Recognition

Abstract

Recent progress on action recognition has mainly focused on RGB and optical flow features. In this paper, we approach the problem of joint-based action recognition. Unlike other modalities, constellation of joints and their motion generate models with succinct human motion information for activity recognition. We present a new model for joint-based action recognition, which first extracts motion features from each joint separately through a shared motion encoder before performing collective reasoning. Our joint selector module re-weights the joint information to select the most discriminative joints for the task. We also propose a novel joint-contrastive loss that pulls together groups of joint features which convey the same action. We strengthen the joint-based representations by using a geometry-aware data augmentation technique which jitters pose heatmaps while retaining the dynamics of the action. We show large improvements over the current state-of-the-art joint-based approaches on JHMDB, HMDB, Charades, AVA action recognition datasets. A late fusion with RGB and Flow-based approaches yields additional improvements. Our model also outperforms the existing baseline on Mimetics, a dataset with out-of-context actions.

Cite

Text

Shah et al. "Pose and Joint-Aware Action Recognition." Winter Conference on Applications of Computer Vision, 2022.

Markdown

[Shah et al. "Pose and Joint-Aware Action Recognition." Winter Conference on Applications of Computer Vision, 2022.](https://mlanthology.org/wacv/2022/shah2022wacv-pose/)

BibTeX

@inproceedings{shah2022wacv-pose,
  title     = {{Pose and Joint-Aware Action Recognition}},
  author    = {Shah, Anshul and Mishra, Shlok and Bansal, Ankan and Chen, Jun-Cheng and Chellappa, Rama and Shrivastava, Abhinav},
  booktitle = {Winter Conference on Applications of Computer Vision},
  year      = {2022},
  pages     = {3850-3860},
  url       = {https://mlanthology.org/wacv/2022/shah2022wacv-pose/}
}