Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion
Abstract
To take advantage of recent advances in human pose estimation from images, we develop a deep neural network model for action recognition from videos by computing temporal human pose features with a 3D CNN model. The proposed temporal pose features can provide more discriminative human action information than previous video features, such as appearance and short-term motion. In addition, we propose a novel fusion network that combines temporal pose, spatial and motion feature maps for the classification by bridging the gap between the dimension difference between 3D and 2D CNN feature maps. We show that the proposed action recognition system provides superior accuracy compared to the previous methods through experiments on Sub-JHMDB and PennAction datasets.
Cite
Text
Huang et al. "Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11012-3_33Markdown
[Huang et al. "Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/huang2018eccvw-human/) doi:10.1007/978-3-030-11012-3_33BibTeX
@inproceedings{huang2018eccvw-human,
title = {{Human Action Recognition Based on Temporal Pose CNN and Multi-Dimensional Fusion}},
author = {Huang, Yi and Lai, Shang-Hong and Tai, Shao-Heng},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {426-440},
doi = {10.1007/978-3-030-11012-3_33},
url = {https://mlanthology.org/eccvw/2018/huang2018eccvw-human/}
}