Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition

Anoop Cherian, Piotr Koniusz, Stephen Gould

WACV 2017 pp. 130-138

doi:10.1109/WACV.2017.22 /wacv/2017/cherian2017wacv-higher/

Abstract

Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higherorder correlations, specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order cooccurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets, and show that our scheme leads to state-of-the-art results.

PDF WACV Semantic Scholar

Cite

Text

Cherian et al. "Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017. doi:10.1109/WACV.2017.22

Markdown

[Cherian et al. "Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017.](https://mlanthology.org/wacv/2017/cherian2017wacv-higher/) doi:10.1109/WACV.2017.22

BibTeX

@inproceedings{cherian2017wacv-higher,
  title     = {{Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition}},
  author    = {Cherian, Anoop and Koniusz, Piotr and Gould, Stephen},
  booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
  year      = {2017},
  pages     = {130-138},
  doi       = {10.1109/WACV.2017.22},
  url       = {https://mlanthology.org/wacv/2017/cherian2017wacv-higher/}
}