Motion Feature Network: Fixed Motion Filter for Action Recognition

Abstract

Spatio-temporal representations in frame sequences play an important role in the task of action recognition. Previously, a method of using optical flow as a temporal information in combination with a set of RGB images that contain spatial information has shown great performance enhancement in the action recognition tasks. However, it has an expensive computational cost and requires two-stream (RGB and optical flow) framework. In this paper, we propose MFNet (Motion Feature Network) containing motion blocks which make it possible to encode spatio-temporal information between adjacent frames in a unified network that can be trained end-to-end. The motion block can be attached to any existing CNN-based action recognition frameworks with only a small additional cost. We evaluated our network on two of the action recognition datasets (Jester and Something-Something) and achieved competitive performances for both datasets by training the networks from scratch.

Cite

Text

Lee et al. "Motion Feature Network: Fixed Motion Filter for Action Recognition." Proceedings of the European Conference on Computer Vision (ECCV), 2018. doi:10.1007/978-3-030-01249-6_24

Markdown

[Lee et al. "Motion Feature Network: Fixed Motion Filter for Action Recognition." Proceedings of the European Conference on Computer Vision (ECCV), 2018.](https://mlanthology.org/eccv/2018/lee2018eccv-motion/) doi:10.1007/978-3-030-01249-6_24

BibTeX

@inproceedings{lee2018eccv-motion,
  title     = {{Motion Feature Network: Fixed Motion Filter for Action Recognition}},
  author    = {Lee, Myunggi and Lee, Seungeui and Son, Sungjoon and Park, Gyutae and Kwak, Nojun},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2018},
  doi       = {10.1007/978-3-030-01249-6_24},
  url       = {https://mlanthology.org/eccv/2018/lee2018eccv-motion/}
}