Graph-Based High-Order Relation Modeling for Long-Term Action Recognition

Abstract

Long-term actions involve many important visual concepts, e.g., objects, motions, and sub-actions, and there are various relations among these concepts, which we call basic relations. These basic relations will jointly affect each other during the temporal evolution of long-term actions, which forms the high-order relations that are essential for long-term action recognition. In this paper, we propose a Graph-based High-order Relation Modeling (GHRM) module to exploit the high-order relations in the long-term actions for long-term action recognition. In GHRM, each basic relation in the long-term actions will be modeled by a graph, where each node represents a segment in a long video. Moreover, when modeling each basic relation, the information from all the other basic relations will be incorporated by GHRM, and thus the high-order relations in the long-term actions can be well exploited. To better exploit the high-order relations along the time dimension, we design a GHRM-layer consisting of a Temporal-GHRM branch and a Semantic-GHRM branch, which aims to model the local temporal high-order relations and global semantic high-order relations. The experimental results on three long-term action recognition datasets, namely, Breakfast, Charades, and MultiThumos, demonstrate the effectiveness of our model.

Cite

Text

Zhou et al. "Graph-Based High-Order Relation Modeling for Long-Term Action Recognition." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00887

Markdown

[Zhou et al. "Graph-Based High-Order Relation Modeling for Long-Term Action Recognition." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/zhou2021cvpr-graphbased/) doi:10.1109/CVPR46437.2021.00887

BibTeX

@inproceedings{zhou2021cvpr-graphbased,
  title     = {{Graph-Based High-Order Relation Modeling for Long-Term Action Recognition}},
  author    = {Zhou, Jiaming and Lin, Kun-Yu and Li, Haoxin and Zheng, Wei-Shi},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {8984-8993},
  doi       = {10.1109/CVPR46437.2021.00887},
  url       = {https://mlanthology.org/cvpr/2021/zhou2021cvpr-graphbased/}
}