View-Invariant Modeling and Recognition of Human Actions Using Grammars

Abstract

In this paper, we represent human actions as sentences generated by a language built on atomic body poses or phonemes. The knowledge of body pose is stored only implicitly as a set of silhouettes seen from multiple viewpoints; no explicit 3D poses or body models are used, and individual body parts are not identified. Actions and their constituent atomic poses are extracted from a set of multiview multiperson video sequences by an automatic keyframe selection process, and are used to automatically construct a probabilistic context-free grammar (PCFG), which encodes the syntax of the actions. Given a new single viewpoint video, we can parse it to recognize actions and changes in viewpoint simultaneously. Experimental results are provided.

Cite

Text

Ogale et al. "View-Invariant Modeling and Recognition of Human Actions Using Grammars." European Conference on Computer Vision, 2006. doi:10.1007/978-3-540-70932-9_9

Markdown

[Ogale et al. "View-Invariant Modeling and Recognition of Human Actions Using Grammars." European Conference on Computer Vision, 2006.](https://mlanthology.org/eccv/2006/ogale2006eccv-view/) doi:10.1007/978-3-540-70932-9_9

BibTeX

@inproceedings{ogale2006eccv-view,
  title     = {{View-Invariant Modeling and Recognition of Human Actions Using Grammars}},
  author    = {Ogale, Abhijit S. and Karapurkar, Alap and Aloimonos, Yiannis},
  booktitle = {European Conference on Computer Vision},
  year      = {2006},
  pages     = {115-126},
  doi       = {10.1007/978-3-540-70932-9_9},
  url       = {https://mlanthology.org/eccv/2006/ogale2006eccv-view/}
}