Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation
Abstract
Deep convolution networks based strategies have shown a remarkable performance in different recognition tasks. Unfortunately, in a variety of realistic scenarios, accurate and robust recognition is hard especially for the videos. Different challenges such as cluttered backgrounds or viewpoint change etc. may generate the problem like large intrinsic and extrinsic class variations. In addition, the problem of data deficiency could also make the designed model degrade during learning and update. Therefore, an effective way by incorporating the frame-wise motion into the learning model on-the-fly has become more and more attractive in contemporary video analysis studies.,,,,,,To overcome those limitations, in this work, we proposed a deeper convolution networks based approach with pairwise motion concatenation, which is named deep temporal convolutional networks. In this work, a temporal motion accumulation mechanism has been introduced as an effective data entry for the learning of convolution networks. Specifically, to handle the possible data deficiency, beneficial practices of transferring ResNet-101 weights and data variation augmentation are also utilized for the purpose of robust recognition. Experiments on challenging dataset UCF101 and ODAR dataset have verified a preferable performance when compared with other state-of-art works.
Cite
Text
Han et al. "Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.162Markdown
[Han et al. "Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/han2017cvprw-video/) doi:10.1109/CVPRW.2017.162BibTeX
@inproceedings{han2017cvprw-video,
title = {{Video Action Recognition Based on Deeper Convolution Networks with Pair-Wise Frame Motion Concatenation}},
author = {Han, Yamin and Zhang, Peng and Zhuo, Tao and Huang, Wei and Zhang, Yanning},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2017},
pages = {1226-1235},
doi = {10.1109/CVPRW.2017.162},
url = {https://mlanthology.org/cvprw/2017/han2017cvprw-video/}
}