Active Vision for Early Recognition of Human Actions

Abstract

We propose a method for early recognition of human actions, one that can take advantages of multiple cameras while satisfying the constraints due to limited communication bandwidth and processing power. Our method considers multiple cameras, and at each time step, it will decide the best camera to use so that a confident recognition decision can be reached as soon as possible. We formulate the camera selection problem as a sequential decision process, and learn a view selection policy based on reinforcement learning. We also develop a novel recurrent neural network architecture to account for the unobserved video frames and the irregular intervals between the observed frames. Experiments on three datasets demonstrate the effectiveness of our approach for early recognition of human actions.

Cite

Text

Wang et al. "Active Vision for Early Recognition of Human Actions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00116

Markdown

[Wang et al. "Active Vision for Early Recognition of Human Actions." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/wang2020cvpr-active/) doi:10.1109/CVPR42600.2020.00116

BibTeX

@inproceedings{wang2020cvpr-active,
  title     = {{Active Vision for Early Recognition of Human Actions}},
  author    = {Wang, Boyu and Huang, Lihan and Hoai, Minh},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00116},
  url       = {https://mlanthology.org/cvpr/2020/wang2020cvpr-active/}
}