ActionFlowNet: Learning Motion Representation for Action Recognition
Abstract
We present a data-efficient representation learning approach to learn video representation with small amount of labeled data. We propose a multitask learning model ActionFlowNet to train a single stream network directly from raw pixels to jointly estimate optical flow while recognizing actions with convolutional neural networks, capturing both appearance and motion in a single model. Our model effectively learns video representation from motion information on unlabeled videos. Our model significantly improves action recognition accuracy by a large margin (23.6%) compared to state-of-the-art CNN-based unsupervised representation learning methods trained without external large scale data and additional optical flow input. Without pretraining on large external labeled datasets, our model, by well exploiting the motion information, achieves competitive recognition accuracy to the models trained with large labeled datasets such as ImageNet and Sport-1M.
Cite
Text
Ng et al. "ActionFlowNet: Learning Motion Representation for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00179Markdown
[Ng et al. "ActionFlowNet: Learning Motion Representation for Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/ng2018wacv-actionflownet/) doi:10.1109/WACV.2018.00179BibTeX
@inproceedings{ng2018wacv-actionflownet,
title = {{ActionFlowNet: Learning Motion Representation for Action Recognition}},
author = {Ng, Joe Yue-Hei and Choi, Jonghyun and Neumann, Jan and Davis, Larry S.},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2018},
pages = {1616-1624},
doi = {10.1109/WACV.2018.00179},
url = {https://mlanthology.org/wacv/2018/ng2018wacv-actionflownet/}
}