Improving Action Localization by Progressive Cross-Stream Cooperation

Abstract

Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to iterative improve action localization results and generate better bounding boxes for one stream (i.e., Flow/RGB) by leveraging both region proposals and features from another stream (i.e., RGB/Flow) in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by using the training samples around temporal boundaries. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.

Cite

Text

Su et al. "Improving Action Localization by Progressive Cross-Stream Cooperation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019. doi:10.1109/CVPR.2019.01229

Markdown

[Su et al. "Improving Action Localization by Progressive Cross-Stream Cooperation." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019.](https://mlanthology.org/cvpr/2019/su2019cvpr-improving/) doi:10.1109/CVPR.2019.01229

BibTeX

@inproceedings{su2019cvpr-improving,
  title     = {{Improving Action Localization by Progressive Cross-Stream Cooperation}},
  author    = {Su, Rui and Ouyang, Wanli and Zhou, Luping and Xu, Dong},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2019},
  doi       = {10.1109/CVPR.2019.01229},
  url       = {https://mlanthology.org/cvpr/2019/su2019cvpr-improving/}
}