Spatio-Temporal Channel Correlation Networks for Action Classification

Diba, Ali; Fayyaz, Mohsen; Sharma, Vivek; Mahdi Arzani, M.; Yousefzadeh, Rahman; Gall, Juergen; Van Gool, Luc

doi:10.1007/978-3-030-01225-0_18

Spatio-Temporal Channel Correlation Networks for Action Classification

Ali Diba, Mohsen Fayyaz, Vivek Sharma, M. Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc Van Gool

ECCV 2018

doi:10.1007/978-3-030-01225-0_18 /eccv/2018/diba2018eccv-spatiotemporal/

Abstract

The work in this paper is driven by the question if spatio-temporal correlations are enough for 3D convolutional neural networks (CNN)? Most of the traditional 3D networks use local spatio-temporal features. We introduce a new block that models correlations between channels of a 3D CNN with respect to temporal and spatial features. This new block can be added as a residual unit to different parts of 3D CNNs. We name our novel block 'Spatio-Temporal Channel Correlation' (STC). By embedding this block to the current state-of-the-art architectures such as ResNext and ResNet, we improve the performance by 2-3% on the Kinetics dataset. Our experiments show that adding STC blocks to current state-of-the-art architectures outperforms the state-of-the-art methods on the HMDB51, UCF101 and Kinetics datasets. The other issue in training 3D CNNs is about training them from scratch with a huge labeled dataset to get a reasonable performance. So the knowledge learned in 2D CNNs is completely ignored. Another contribution in this work is a simple and effective technique to transfer knowledge from a pre-trained 2D CNN to a randomly initialized 3D CNN for a stable weight initialization. This allows us to significantly reduce the number of training samples for 3D CNNs. Thus, by fine-tuning this network, we beat the performance of generic and recent methods in 3D CNNs, which were trained on large video datasets, e.g. Sports-1M, and fine-tuned on the target datasets, e.g. HMDB51/UCF101.

PDF ECCV Semantic Scholar

Cite

Text

Diba et al. "Spatio-Temporal Channel Correlation Networks for Action Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01225-0_18

Markdown

[Diba et al. "Spatio-Temporal Channel Correlation Networks for Action Classification." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/diba2018eccv-spatiotemporal/) doi:10.1007/978-3-030-01225-0_18

BibTeX

@inproceedings{diba2018eccv-spatiotemporal,
  title     = {{Spatio-Temporal Channel Correlation Networks for Action Classification}},
  author    = {Diba, Ali and Fayyaz, Mohsen and Sharma, Vivek and Mahdi Arzani, M. and Yousefzadeh, Rahman and Gall, Juergen and Van Gool, Luc},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01225-0_18},
  url       = {https://mlanthology.org/eccv/2018/diba2018eccv-spatiotemporal/}
}