Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition
Abstract
Recent work shows how to use local spatio-temporal features to learn models of realistic human actions from video. However, existing methods typically rely on a predefined spatial binning of the local descriptors to impose spatial information beyond a pure "bag-of-words" model, and thus may fail to capture the most informative space-time relationships. We propose to learn the shapes of space-time feature neighborhoods that are most discriminative for a given action category. Given a set of training videos, our method first extracts local motion and appearance features, quantizes them to a visual vocabulary, and then forms candidate neighborhoods consisting of the words associated with nearby points and their orientation with respect to the central interest point. Rather than dictate a particular scaling of the spatial and temporal dimensions to determine which points are near, we show how to learn the class-specific distance functions that form the most informative configurations. Descriptors for these variable-sized neighborhoods are then recursively mapped to higher-level vocabularies, producing a hierarchy of space-time configurations at successively broader scales. Our approach yields state-of-the-art performance on the UCF Sports and KTH datasets.
Cite
Text
Kovashka and Grauman. "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010. doi:10.1109/CVPR.2010.5539881Markdown
[Kovashka and Grauman. "Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2010.](https://mlanthology.org/cvpr/2010/kovashka2010cvpr-learning/) doi:10.1109/CVPR.2010.5539881BibTeX
@inproceedings{kovashka2010cvpr-learning,
title = {{Learning a Hierarchy of Discriminative Space-Time Neighborhood Features for Human Action Recognition}},
author = {Kovashka, Adriana and Grauman, Kristen},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2010},
pages = {2046-2053},
doi = {10.1109/CVPR.2010.5539881},
url = {https://mlanthology.org/cvpr/2010/kovashka2010cvpr-learning/}
}