Learning to Refactor Action and Co-Occurrence Features for Temporal Action Localization
Abstract
The main challenge of Temporal Action Localization is to retrieve subtle human actions from various co-occurring ingredients, e.g., context and background, in an untrimmed video. While prior approaches have achieved substantial progress through devising advanced action detectors, they still suffer from these co-occurring ingredients which often dominate the actual action content in videos. In this paper, we explore two orthogonal but complementary aspects of a video snippet, i.e., the action features and the co-occurrence features. Especially, we develop a novel auxiliary task by decoupling these two types of features within a video snippet and recombining them to generate a new feature representation with more salient action information for accurate action localization. We term our method RefactorNet, which first explicitly factorizes the action content and regularizes its co-occurrence features, and then synthesizes a new action-dominated video representation. Extensive experimental results and ablation studies on THUMOS14 and ActivityNet v1.3 demonstrate that our new representation, combined with a simple action detector, can significantly improve the action localization performance.
Cite
Text
Xia et al. "Learning to Refactor Action and Co-Occurrence Features for Temporal Action Localization." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01351Markdown
[Xia et al. "Learning to Refactor Action and Co-Occurrence Features for Temporal Action Localization." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/xia2022cvpr-learning/) doi:10.1109/CVPR52688.2022.01351BibTeX
@inproceedings{xia2022cvpr-learning,
title = {{Learning to Refactor Action and Co-Occurrence Features for Temporal Action Localization}},
author = {Xia, Kun and Wang, Le and Zhou, Sanping and Zheng, Nanning and Tang, Wei},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {13884-13893},
doi = {10.1109/CVPR52688.2022.01351},
url = {https://mlanthology.org/cvpr/2022/xia2022cvpr-learning/}
}