Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition

Abstract

Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure "bag-of-words" model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.

Cite

Text

Kovashka and Grauman. "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010. doi:10.1109/CVPR.2010.5539881

Markdown

[Kovashka and Grauman. "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010.](https://mlanthology.org/cvpr/2010/kovashka2010cvpr-learning/) doi:10.1109/CVPR.2010.5539881

BibTeX

@inproceedings{kovashka2010cvpr-learning,
  title     = {{Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition}},
  author    = {Kovashka, Adriana and Grauman, Kristen},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2010},
  pages     = {2046-2053},
  doi       = {10.1109/CVPR.2010.5539881},
  url       = {https://mlanthology.org/cvpr/2010/kovashka2010cvpr-learning/}
}