Dance with Flow: Two-in-One Stream Action Detection

Abstract

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

Cite

Text

Zhao and Snoek. "Dance with Flow: Two-in-One Stream Action Detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01017

Markdown

[Zhao and Snoek. "Dance with Flow: Two-in-One Stream Action Detection." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/zhao2019cvpr-dance/) doi:10.1109/CVPR.2019.01017

BibTeX

@inproceedings{zhao2019cvpr-dance,
  title     = {{Dance with Flow: Two-in-One Stream Action Detection}},
  author    = {Zhao, Jiaojiao and Snoek, Cees G. M.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.01017},
  url       = {https://mlanthology.org/cvpr/2019/zhao2019cvpr-dance/}
}