Relational Prototypical Network for Weakly Supervised Temporal Action Localization

Abstract

In this paper, we propose a weakly supervised temporal action localization method on untrimmed videos based on prototypical networks. We observe two challenges posed by weakly supervision, namely action-background separation and action relation construction. Unlike the previous method, we propose to achieve action-background separation only by the original videos. To achieve this, a clustering loss is adopted to separate actions from backgrounds and learn intra-compact features, which helps in detecting complete action instances. Besides, a similarity weighting module is devised to further separate actions from backgrounds. To effectively identify actions, we propose to construct relations among actions for prototype learning. A GCN-based prototype embedding module is introduced to generate relational prototypes. Experiments on THUMOS14 and ActivityNet1.2 datasets show that our method outperforms the state-of-the-art methods.

Cite

Text

Huang et al. "Relational Prototypical Network for Weakly Supervised Temporal Action Localization." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I07.6760

Markdown

[Huang et al. "Relational Prototypical Network for Weakly Supervised Temporal Action Localization." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/huang2020aaai-relational/) doi:10.1609/AAAI.V34I07.6760

BibTeX

@inproceedings{huang2020aaai-relational,
  title     = {{Relational Prototypical Network for Weakly Supervised Temporal Action Localization}},
  author    = {Huang, Linjiang and Huang, Yan and Ouyang, Wanli and Wang, Liang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {11053-11060},
  doi       = {10.1609/AAAI.V34I07.6760},
  url       = {https://mlanthology.org/aaai/2020/huang2020aaai-relational/}
}