Deep Feature Flow for Video Recognition

Abstract

Deep convolutional neutral networks have achieved great success on image recognition tasks. Yet, it is non-trivial to transfer the state-of-the-art image recognition networks to videos as per-frame evaluation is too slow and unaffordable. We present deep feature flow, a fast and accurate framework for video recognition. It runs the expensive convolutional sub-network only on sparse key frames and propagates their deep feature maps to other frames via a flow field. It achieves significant speedup as flow computation is relatively fast. The end-to-end training of the whole architecture significantly boosts the recognition accuracy. Deep feature flow is flexible and general. It is validated on two recent large scale video datasets. It makes a large step towards practical video recognition. Code would be released.

Cite

Text

Zhu et al. "Deep Feature Flow for Video Recognition." Conference on Computer Vision and Pattern Recognition, 2017. doi:10.1109/CVPR.2017.441

Markdown

[Zhu et al. "Deep Feature Flow for Video Recognition." Conference on Computer Vision and Pattern Recognition, 2017.](https://mlanthology.org/cvpr/2017/zhu2017cvpr-deep/) doi:10.1109/CVPR.2017.441

BibTeX

@inproceedings{zhu2017cvpr-deep,
  title     = {{Deep Feature Flow for Video Recognition}},
  author    = {Zhu, Xizhou and Xiong, Yuwen and Dai, Jifeng and Yuan, Lu and Wei, Yichen},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2017},
  doi       = {10.1109/CVPR.2017.441},
  url       = {https://mlanthology.org/cvpr/2017/zhu2017cvpr-deep/}
}