Multi-Modal Multi-Action Video Recognition

Abstract

Multi-action video recognition is much more challenging due to the requirement to recognize multiple actions co-occurring simultaneously or sequentially. Modeling multi-action relations is beneficial and crucial to understand videos with multiple actions, and actions in a video are usually presented in multiple modalities. In this paper, we propose a novel multi-action relation model for videos, by leveraging both relational graph convolutional networks (GCNs) and video multi-modality. We first build multi-modal GCNs to explore modality-aware multi-action relations, fed by modality-specific action representation as node features, e.g., spatiotemporal features learned by 3D convolutional neural network (CNN), audio and textual embeddings queried from respective feature lexicons. We then joint both multi-modal CNN-GCN models and multi-modal feature representations for learning better relational action predictions. Ablation study, multi-action relation visualization, and boosts analysis, all show efficacy of our multi-modal multi-action relation modeling. Also our method achieves state-of-the-art performance on large-scale multi-action M-MiT benchmark. Our code is made publicly available at https://github.com/zhenglab/multi-action-video.

Cite

Text

Shi et al. "Multi-Modal Multi-Action Video Recognition." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01342

Markdown

[Shi et al. "Multi-Modal Multi-Action Video Recognition." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/shi2021iccv-multimodal/) doi:10.1109/ICCV48922.2021.01342

BibTeX

@inproceedings{shi2021iccv-multimodal,
  title     = {{Multi-Modal Multi-Action Video Recognition}},
  author    = {Shi, Zhensheng and Liang, Ju and Li, Qianqian and Zheng, Haiyong and Gu, Zhaorui and Dong, Junyu and Zheng, Bing},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {13678-13687},
  doi       = {10.1109/ICCV48922.2021.01342},
  url       = {https://mlanthology.org/iccv/2021/shi2021iccv-multimodal/}
}