Temporal Difference Networks for Video Action Recognition
Abstract
Deep convolutional neural networks have been great success for image based recognition tasks. However, it is still unclear how to model the temporal evolution of videos effectively by deep networks. While recent deep models for videos show improvement by incorporating optical flow or aggregating high level appearance across frames, they focus on modeling either the long term temporal relations or short term motion. We propose Temporal Difference Networks (TDN) that model both long term relations and short term motion from videos. We leverage a simple but effective motion representation: difference of CNN features in our network and jointly modeling the motion at multiple scales in a single CNN. It achieves state-of-the-art performance on three different video classification benchmarks, showing the effectiveness of our approach to learn temporal relations in videos.
Cite
Text
Ng and Davis. "Temporal Difference Networks for Video Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018. doi:10.1109/WACV.2018.00176Markdown
[Ng and Davis. "Temporal Difference Networks for Video Action Recognition." IEEE/CVF Winter Conference on Applications of Computer Vision, 2018.](https://mlanthology.org/wacv/2018/ng2018wacv-temporal/) doi:10.1109/WACV.2018.00176BibTeX
@inproceedings{ng2018wacv-temporal,
title = {{Temporal Difference Networks for Video Action Recognition}},
author = {Ng, Joe Yue-Hei and Davis, Larry S.},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2018},
pages = {1587-1596},
doi = {10.1109/WACV.2018.00176},
url = {https://mlanthology.org/wacv/2018/ng2018wacv-temporal/}
}