StNet: Local and Global Spatial-Temporal Modeling for Action Recognition

Abstract

Despite the success of deep learning for static image understanding, it remains unclear what are the most effective network architectures for spatial-temporal modeling in videos. In this paper, in contrast to the existing CNN+RNN or pure 3D convolution based approaches, we explore a novel spatialtemporal network (StNet) architecture for both local and global modeling in videos. Particularly, StNet stacks N successive video frames into a super-image which has 3N channels and applies 2D convolution on super-images to capture local spatial-temporal relationship. To model global spatialtemporal structure, we apply temporal convolution on the local spatial-temporal feature maps. Specifically, a novel temporal Xception block is proposed in StNet, which employs a separate channel-wise and temporal-wise convolution over the feature sequence of a video. Extensive experiments on the Kinetics dataset demonstrate that our framework outperforms several state-of-the-art approaches in action recognition and can strike a satisfying trade-off between recognition accuracy and model complexity. We further demonstrate the generalization performance of the leaned video representations on the UCF101 dataset.

Cite

Text

He et al. "StNet: Local and Global Spatial-Temporal Modeling for Action Recognition." AAAI Conference on Artificial Intelligence, 2019. doi:10.1609/AAAI.V33I01.33018401

Markdown

[He et al. "StNet: Local and Global Spatial-Temporal Modeling for Action Recognition." AAAI Conference on Artificial Intelligence, 2019.](https://mlanthology.org/aaai/2019/he2019aaai-stnet/) doi:10.1609/AAAI.V33I01.33018401

BibTeX

@inproceedings{he2019aaai-stnet,
  title     = {{StNet: Local and Global Spatial-Temporal Modeling for Action Recognition}},
  author    = {He, Dongliang and Zhou, Zhichao and Gan, Chuang and Li, Fu and Liu, Xiao and Li, Yandong and Wang, Limin and Wen, Shilei},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2019},
  pages     = {8401-8408},
  doi       = {10.1609/AAAI.V33I01.33018401},
  url       = {https://mlanthology.org/aaai/2019/he2019aaai-stnet/}
}