STM: SpatioTemporal and Motion Encoding for Action Recognition

Jiang, Boyuan; Wang, MengMeng; Gan, Weihao; Wu, Wei; Yan, Junjie

doi:10.1109/ICCV.2019.00209

STM: SpatioTemporal and Motion Encoding for Action Recognition

Boyuan Jiang, MengMeng Wang, Weihao Gan, Wei Wu, Junjie Yan

ICCV 2019

doi:10.1109/ICCV.2019.00209 /iccv/2019/jiang2019iccv-stm/

Abstract

Spatiotemporal and motion features are two complementary and crucial information for video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn spatiotemporal features and another flow stream to learn motion features. In this work, we aim to efficiently encode these two features in a unified 2D framework. To this end, we first propose a STM block, which contains a Channel-wise SpatioTemporal Module (CSTM) to present the spatiotemporal features and a Channel-wise Motion Module (CMM) to efficiently encode motion features. We then replace original residual blocks in the ResNet architecture with STM blcoks to form a simple yet effective STM network by introducing very limited extra computation cost. Extensive experiments demonstrate that the proposed STM network outperforms the state-of-the-art methods on both temporal-related datasets (i.e., Something-Something v1 & v2 and Jester) and scene-related datasets (i.e., Kinetics-400, UCF-101, and HMDB-51) with the help of encoding spatiotemporal and motion features together.

PDF ICCV Semantic Scholar

Cite

Text

Jiang et al. "STM: SpatioTemporal and Motion Encoding for Action Recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00209

Markdown

[Jiang et al. "STM: SpatioTemporal and Motion Encoding for Action Recognition." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/jiang2019iccv-stm/) doi:10.1109/ICCV.2019.00209

BibTeX

@inproceedings{jiang2019iccv-stm,
  title     = {{STM: SpatioTemporal and Motion Encoding for Action Recognition}},
  author    = {Jiang, Boyuan and Wang, MengMeng and Gan, Weihao and Wu, Wei and Yan, Junjie},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00209},
  url       = {https://mlanthology.org/iccv/2019/jiang2019iccv-stm/}
}