Discriminative Hierarchical Rank Pooling for Activity Recognition

Abstract

We present hierarchical rank pooling, a video sequence encoding method for activity recognition. It consists of a network of rank pooling functions which captures the dynamics of rich convolutional neural network features within a video sequence. By stacking non-linear feature functions and rank pooling over one another, we obtain a high capacity dynamic encoding mechanism, which is used for action recognition. We present a method for jointly learning the video representation and activity classifier parameters. Our method obtains state-of-the art results on three important activity recognition benchmarks: 76.7% on Hollywood2, 66.9% on HMDB51 and, 91.4% on UCF101.

Cite

Text

Fernando et al. "Discriminative Hierarchical Rank Pooling for Activity Recognition." Conference on Computer Vision and Pattern Recognition, 2016. doi:10.1109/CVPR.2016.212

Markdown

[Fernando et al. "Discriminative Hierarchical Rank Pooling for Activity Recognition." Conference on Computer Vision and Pattern Recognition, 2016.](https://mlanthology.org/cvpr/2016/fernando2016cvpr-discriminative/) doi:10.1109/CVPR.2016.212

BibTeX

@inproceedings{fernando2016cvpr-discriminative,
  title     = {{Discriminative Hierarchical Rank Pooling for Activity Recognition}},
  author    = {Fernando, Basura and Anderson, Peter and Hutter, Marcus and Gould, Stephen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2016},
  doi       = {10.1109/CVPR.2016.212},
  url       = {https://mlanthology.org/cvpr/2016/fernando2016cvpr-discriminative/}
}