Action Recognition and Localization by Hierarchical Space-Time Segments

Abstract

We propose Hierarchical Space-Time Segments as a new representation for action recognition and localization. This representation has a two-level hierarchy. The first level comprises the root space-time segments that may contain a human body. The second level comprises multi-grained space-time segments that contain parts of the root. We present an unsupervised method to generate this representation from video, which extracts both static and non-static relevant space-time segments, and also preserves their hierarchical and temporal relationships. Using simple linear SVM on the resultant bag of hierarchical space-time segments representation, we attain better than, or comparable to, state-of-the-art action recognition performance on two challenging benchmark datasets and at the same time produce good action localization results.

Cite

Text

Ma et al. "Action Recognition and Localization by Hierarchical Space-Time Segments." International Conference on Computer Vision, 2013. doi:10.1109/ICCV.2013.341

Markdown

[Ma et al. "Action Recognition and Localization by Hierarchical Space-Time Segments." International Conference on Computer Vision, 2013.](https://mlanthology.org/iccv/2013/ma2013iccv-action/) doi:10.1109/ICCV.2013.341

BibTeX

@inproceedings{ma2013iccv-action,
  title     = {{Action Recognition and Localization by Hierarchical Space-Time Segments}},
  author    = {Ma, Shugao and Zhang, Jianming and Ikizler-Cinbis, Nazli and Sclaroff, Stan},
  booktitle = {International Conference on Computer Vision},
  year      = {2013},
  doi       = {10.1109/ICCV.2013.341},
  url       = {https://mlanthology.org/iccv/2013/ma2013iccv-action/}
}