Uncertainty-Based Spatial-Temporal Attention for Online Action Detection

Abstract

Online action detection aims at detecting the ongoing action in a streaming video. In this paper, we proposed an uncertainty-based spatial-temporal attention for online action detection. By explicitly modeling the distribution of model parameters, we extend the baseline models in a probabilistic manner. Then we quantify the predictive uncertainty and use it to generate spatial-temporal attention that focus on large mutual information regions and frames. For inference, we introduce a two-stream framework that combines the baseline model and the probabilistic model based on the input uncertainty. We validate the effectiveness of our method on three benchmark datasets: THUMOS-14, TVSeries, and HDD. Furthermore, we demonstrate that our method generalizes better under different views and occlusions, and is more robust when training with small-scale data.

Cite

Text

Guo et al. "Uncertainty-Based Spatial-Temporal Attention for Online Action Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19772-7_5

Markdown

[Guo et al. "Uncertainty-Based Spatial-Temporal Attention for Online Action Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/guo2022eccv-uncertaintybased/) doi:10.1007/978-3-031-19772-7_5

BibTeX

@inproceedings{guo2022eccv-uncertaintybased,
  title     = {{Uncertainty-Based Spatial-Temporal Attention for Online Action Detection}},
  author    = {Guo, Hongji and Ren, Zhou and Wu, Yi and Hua, Gang and Ji, Qiang},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19772-7_5},
  url       = {https://mlanthology.org/eccv/2022/guo2022eccv-uncertaintybased/}
}