Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition
Abstract
Most successful deep learning algorithms for action recognition extend models designed for image-based tasks such as object recognition to video. Such extensions are typically trained for actions on single video frames or very short clips, and then their predictions from sliding-windows over the video sequence are pooled for recognizing the action at the sequence level. Usually this pooling step uses the first-order statistics of frame-level action predictions. In this paper, we explore the advantages of using higherorder correlations, specifically, we introduce Higher-order Kernel (HOK) descriptors generated from the late fusion of CNN classifier scores from all the frames in a sequence. To generate these descriptors, we use the idea of kernel linearization. Specifically, a similarity kernel matrix, which captures the temporal evolution of deep classifier scores, is first linearized into kernel feature maps. The HOK descriptors are then generated from the higher-order cooccurrences of these feature maps, and are then used as input to a video-level classifier. We provide experiments on two fine-grained action recognition datasets, and show that our scheme leads to state-of-the-art results.
Cite
Text
Cherian et al. "Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017. doi:10.1109/WACV.2017.22Markdown
[Cherian et al. "Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2017.](https://mlanthology.org/wacv/2017/cherian2017wacv-higher/) doi:10.1109/WACV.2017.22BibTeX
@inproceedings{cherian2017wacv-higher,
title = {{Higher-Order Pooling of CNN Features via Kernel Linearization for Action Recognition}},
author = {Cherian, Anoop and Koniusz, Piotr and Gould, Stephen},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2017},
pages = {130-138},
doi = {10.1109/WACV.2017.22},
url = {https://mlanthology.org/wacv/2017/cherian2017wacv-higher/}
}