Temporal RPN Learning for Weakly-Supervised Temporal Action Localization
Abstract
Weakly-Supervised Temporal Action Localization (WSTAL) aims to train an action instance localization model from untrimmed videos with only video-level labels, similar to the Object Detection (OD) task. Existing Top-k MIL-based WSTAL methods cannot flexibly define the learning space, which limits the model’s learning efficiency and performance. Faster R-CNN is a classic two-stage object detection architecture with an efficient Region Proposal Network. This paper successfully migrates the Faster R-CNN liked two-stage architecture to the WSTAL task: first to build a T-RPN and integrate it with the traditional WSTAL framework; and then to propose a pseudo label generation mechanism to enable the T-RPN learning without temporal annotations. Our new framework has achieved breakthrough performances on THUMOS-14 and ActivityNet-v1.2 datasets, and comprehensive ablation experiments have verified the effectiveness of the innovations. Code will be available at: \href{https://github.com/ZJUHJ/TRPN}https://github.com/ZJUHJ/TRPN.
Cite
Text
Huang et al. "Temporal RPN Learning for Weakly-Supervised Temporal Action Localization." Proceedings of the 15th Asian Conference on Machine Learning, 2023.Markdown
[Huang et al. "Temporal RPN Learning for Weakly-Supervised Temporal Action Localization." Proceedings of the 15th Asian Conference on Machine Learning, 2023.](https://mlanthology.org/acml/2023/huang2023acml-temporal/)BibTeX
@inproceedings{huang2023acml-temporal,
title = {{Temporal RPN Learning for Weakly-Supervised Temporal Action Localization}},
author = {Huang, Jing and Kong, Ming and Chen, Luyuan and Liang, Tian and Zhu, Qiang},
booktitle = {Proceedings of the 15th Asian Conference on Machine Learning},
year = {2023},
pages = {470-485},
volume = {222},
url = {https://mlanthology.org/acml/2023/huang2023acml-temporal/}
}