Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization

Abstract

The key to video action detection lies in the understanding of interaction between persons and background objects in a video. Current methods usually employ object detectors to extract objects directly or use grid features to represent objects in the environment, which underestimate the great potential of multi-scale context information (e.g., objects and scenes of different sizes). How to exactly represent the multi-scale context and make full utilization of it still remains an unresolved challenge for spatial-temporal action localization. In this paper, we propose a novel Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network (AMCRNet) that extracts multi-scale context through multiple pooling layers with different sizes. Specifically, we develop an Interactive Relation Extraction module to model the higher-order relation between the target person and the context (e.g., other persons and objects). Along this line, we further propose a History Feature Bank and Interaction method to achieve better performance by modeling such relation across continuing video clips. Extensive experimental results on AVA2.2 and UCF101-24 demonstrate the superiority and rationality of our proposed AMCRNet.

Cite

Text

Yu et al. "Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization." International Joint Conference on Artificial Intelligence, 2023. doi:10.24963/IJCAI.2023/186

Markdown

[Yu et al. "Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization." International Joint Conference on Artificial Intelligence, 2023.](https://mlanthology.org/ijcai/2023/yu2023ijcai-actor/) doi:10.24963/IJCAI.2023/186

BibTeX

@inproceedings{yu2023ijcai-actor,
  title     = {{Actor-Multi-Scale Context Bidirectional Higher Order Interactive Relation Network for Spatial-Temporal Action Localization}},
  author    = {Yu, Jun and Zheng, Yingshuai and Ruan, Shulan and Liu, Qi and Cheng, Zhiyuan and Wu, Jinze},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {1676-1685},
  doi       = {10.24963/IJCAI.2023/186},
  url       = {https://mlanthology.org/ijcai/2023/yu2023ijcai-actor/}
}