Spatiotemporal Multiplier Networks for Video Action Recognition

Feichtenhofer, Christoph; Pinz, Axel; Wildes, Richard P.

doi:10.1109/CVPR.2017.787

Spatiotemporal Multiplier Networks for Video Action Recognition

Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

CVPR 2017

doi:10.1109/CVPR.2017.787 /cvpr/2017/feichtenhofer2017cvpr-spatiotemporal/

Abstract

This paper presents a general ConvNet architecture for video action recognition based on multiplicative interactions of spacetime features. Our model combines the appearance and motion pathways of a two-stream architecture by motion gating and is trained end-to-end. We theoretically motivate multiplicative gating functions for residual networks and empirically study their effect on classification accuracy. To capture long-term dependencies we inject identity mapping kernels for learning temporal relationships. Our architecture is fully convolutional in spacetime and able to evaluate a video in a single forward pass. Empirical investigation reveals that our model produces state-of-the-art results on two standard action recognition datasets.

PDF CVPR Semantic Scholar

Cite

Text

Feichtenhofer et al. "Spatiotemporal Multiplier Networks for Video Action Recognition." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.787

Markdown

[Feichtenhofer et al. "Spatiotemporal Multiplier Networks for Video Action Recognition." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/feichtenhofer2017cvpr-spatiotemporal/) doi:10.1109/CVPR.2017.787

BibTeX

@inproceedings{feichtenhofer2017cvpr-spatiotemporal,
  title     = {{Spatiotemporal Multiplier Networks for Video Action Recognition}},
  author    = {Feichtenhofer, Christoph and Pinz, Axel and Wildes, Richard P.},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.787},
  url       = {https://mlanthology.org/cvpr/2017/feichtenhofer2017cvpr-spatiotemporal/}
}