Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Huang, Linjiang; Wang, Liang; Li, Hongsheng

doi:10.1109/ICCV48922.2021.00790

Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization

Linjiang Huang, Liang Wang, Hongsheng Li

ICCV 2021 pp. 8002-8011

doi:10.1109/ICCV48922.2021.00790 /iccv/2021/huang2021iccv-foregroundaction/

Abstract

As a challenging task of high-level video understanding, weakly supervised temporal action localization has been attracting increasing attention. With only video annotations, most existing methods seek to handle this task with a localization-by-classification framework, which generally adopts a selector to select snippets of high probabilities of actions or namely the foreground. Nevertheless, the existing foreground selection strategies have a major limitation of only considering the unilateral relation from foreground to actions, which cannot guarantee the foreground-action consistency. In this paper, we present a framework named FAC-Net based on the I3D backbone, on which three branches are appended, named class-wise foreground classification branch, class-agnostic attention branch and multiple instance learning branch. First, our class-wise foreground classification branch regularizes the relation between actions and foreground to maximize the foreground-background separation. Besides, the class-agnostic attention branch and multiple instance learning branch are adopted to regularize the foreground-action consistency and help to learn a meaningful foreground classifier. Within each branch, we introduce a hybrid attention mechanism, which calculates multiple attention scores for each snippet, to focus on both discriminative and less-discriminative snippets to capture the full action boundaries. Experimental results on THUMOS14 and ActivityNet1.3 demonstrate the superior performance over state-of-the-art approaches.

PDF ICCV Semantic Scholar

Cite

Text

Huang et al. "Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00790

Markdown

[Huang et al. "Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/huang2021iccv-foregroundaction/) doi:10.1109/ICCV48922.2021.00790

BibTeX

@inproceedings{huang2021iccv-foregroundaction,
  title     = {{Foreground-Action Consistency Network for Weakly Supervised Temporal Action Localization}},
  author    = {Huang, Linjiang and Wang, Liang and Li, Hongsheng},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {8002-8011},
  doi       = {10.1109/ICCV48922.2021.00790},
  url       = {https://mlanthology.org/iccv/2021/huang2021iccv-foregroundaction/}
}