Learning Action Primitives for Multi-Level Video Event Understanding

Abstract

Human action categories exhibit significant intra-class variation. Changes in viewpoint, human appearance, and the temporal evolution of an action confound recognition algorithms. In order to address this, we present an approach to discover action primitives, sub-categories of action classes, that allow us to model this intra-class variation. We learn action primitives and their interrelations in a multi-level spatio-temporal model for action recognition. Action primitives are discovered via a data-driven clustering approach that focuses on repeatable, discriminative sub-categories. Higher-level interactions between action primitives and the actions of a set of people present in a scene are learned. Empirical results demonstrate that these action primitives can be effectively localized, and using them to model action classes improves action recognition performance on challenging datasets.

Cite

Text

Lan et al. "Learning Action Primitives for Multi-Level Video Event Understanding." European Conference on Computer Vision Workshops, 2014. doi:10.1007/978-3-319-16199-0_7

Markdown

[Lan et al. "Learning Action Primitives for Multi-Level Video Event Understanding." European Conference on Computer Vision Workshops, 2014.](https://mlanthology.org/eccvw/2014/lan2014eccvw-learning/) doi:10.1007/978-3-319-16199-0_7

BibTeX

@inproceedings{lan2014eccvw-learning,
  title     = {{Learning Action Primitives for Multi-Level Video Event Understanding}},
  author    = {Lan, Tian and Chen, Lei and Deng, Zhiwei and Zhou, Guang-Tong and Mori, Greg},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2014},
  pages     = {95-110},
  doi       = {10.1007/978-3-319-16199-0_7},
  url       = {https://mlanthology.org/eccvw/2014/lan2014eccvw-learning/}
}