Spatio-Temporal Dynamic Inference Network for Group Activity Recognition

Abstract

Group activity recognition aims to understand the activity performed by a group of people. In order to solve it, modeling complex spatio-temporal interactions is the key. Previous methods are limited in reasoning on a predefined graph, which ignores the inherent person-specific interaction context. Moreover, they adopt inference schemes that are computationally expensive and easily result in the over-smoothing problem. In this paper, we manage to achieve spatio-temporal person-specific inferences by proposing Dynamic Inference Network (DIN), which composes of Dynamic Relation (DR) module and Dynamic Walk (DW) module. We firstly propose to initialize interaction fields on a primary spatio-temporal graph. Within each interaction field, we apply DR to predict the relation matrix and DW to predict the dynamic walk offsets in a joint-processing manner, thus forming a person-specific interaction graph. By updating features on the specific graph, a person can possess a global-level interaction field with a local initialization. Experiments indicate both modules' effectiveness. Moreover, DIN achieves significant improvement compared to previous state-of-the-art methods on two popular datasets under the same setting, while costing much less computation overhead of the reasoning module.

Cite

Text

Yuan et al. "Spatio-Temporal Dynamic Inference Network for Group Activity Recognition." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00738

Markdown

[Yuan et al. "Spatio-Temporal Dynamic Inference Network for Group Activity Recognition." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/yuan2021iccv-spatiotemporal/) doi:10.1109/ICCV48922.2021.00738

BibTeX

@inproceedings{yuan2021iccv-spatiotemporal,
  title     = {{Spatio-Temporal Dynamic Inference Network for Group Activity Recognition}},
  author    = {Yuan, Hangjie and Ni, Dong and Wang, Mang},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {7476-7485},
  doi       = {10.1109/ICCV48922.2021.00738},
  url       = {https://mlanthology.org/iccv/2021/yuan2021iccv-spatiotemporal/}
}