Learning Salient Boundary Feature for Anchor-Free Temporal Action Localization

Abstract

Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start and end frame for each action instance in a long, untrimmed video. While most current models achieve good results by using pre-defined anchors and numerous actionness, such methods could be bothered with both large number of outputs and heavy tuning of locations and sizes corresponding to different anchors. Instead, anchor-free methods is lighter, getting rid of redundant hyper-parameters, but gains few attention. In this paper, we propose the first purely anchor-free temporal localization method, which is both efficient and effective. Our model includes (i) an end-to-end trainable basic predictor, (ii) a saliency-based refinement module to gather more valuable boundary features for each proposal with a novel boundary pooling, and (iii) several consistency constraints to make sure our model can find the accurate boundary given arbitrary proposals. Extensive experiments show that our method beats all anchor-based and actionness-guided methods with a remarkable margin on THUMOS14, achieving state-of-the-art results, and comparable ones on ActivityNet v1.3. Our code will be made available upon publication.

Cite

Text

Lin et al. "Learning Salient Boundary Feature for Anchor-Free Temporal Action Localization." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00333

Markdown

[Lin et al. "Learning Salient Boundary Feature for Anchor-Free Temporal Action Localization." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/lin2021cvpr-learning/) doi:10.1109/CVPR46437.2021.00333

BibTeX

@inproceedings{lin2021cvpr-learning,
  title     = {{Learning Salient Boundary Feature for Anchor-Free Temporal Action Localization}},
  author    = {Lin, Chuming and Xu, Chengming and Luo, Donghao and Wang, Yabiao and Tai, Ying and Wang, Chengjie and Li, Jilin and Huang, Feiyue and Fu, Yanwei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {3320-3329},
  doi       = {10.1109/CVPR46437.2021.00333},
  url       = {https://mlanthology.org/cvpr/2021/lin2021cvpr-learning/}
}