Deep Moving Poselets for Video Based Action Recognition

Abstract

We propose a new approach to action classification in video, which uses deep appearance and motion features extracted from spatio-temporal volumes defined along body part trajectories to learn mid-level classifiers called deep moving poselets. A deep moving poselet is a classifier that captures a characteristic body part configuration, with a specific appearance and undergoing a specific movement. By having this mid-level representation of a body part be shared across action classes and by learning it jointly with action classifiers, we obtain a representation that is interpretable, shared and discriminative. In addition, by using sparsity-inducing norms to regularize action classifiers, we can reduce the number of deep moving poselets used by each class without hurting performance. Experiments show that the proposed method achieves state-of-the-art performance on the popular and challenging sub-JHMDB and MSR Daily Activity datasets.

Cite

Text

Mavroudi et al. "Deep Moving Poselets for Video Based Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017. doi:10.1109/WACV.2017.20

Markdown

[Mavroudi et al. "Deep Moving Poselets for Video Based Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017.](https://mlanthology.org/wacv/2017/mavroudi2017wacv-deep/) doi:10.1109/WACV.2017.20

BibTeX

@inproceedings{mavroudi2017wacv-deep,
  title     = {{Deep Moving Poselets for Video Based Action Recognition}},
  author    = {Mavroudi, Effrosyni and Tao, Lingling and Vidal, René},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2017},
  pages     = {111-120},
  doi       = {10.1109/WACV.2017.20},
  url       = {https://mlanthology.org/wacv/2017/mavroudi2017wacv-deep/}
}