Background Suppression Network for Weakly-Supervised Temporal Action Localization
Abstract
Weakly-supervised temporal action localization is a very challenging problem because frame-wise labels are not given in the training stage while the only hint is video-level labels: whether each video contains action frames of interest. Previous methods aggregate frame-level class scores to produce video-level prediction and learn from video-level action labels. This formulation does not fully model the problem in that background frames are forced to be misclassified as action classes to predict video-level labels accurately. In this paper, we design Background Suppression Network (BaS-Net) which introduces an auxiliary class for background and has a two-branch weight-sharing architecture with an asymmetrical training strategy. This enables BaS-Net to suppress activations from background frames to improve localization performance. Extensive experiments demonstrate the effectiveness of BaS-Net and its superiority over the state-of-the-art methods on the most popular benchmarks – THUMOS'14 and ActivityNet. Our code and the trained model are available at https://github.com/Pilhyeon/BaSNet-pytorch.
Cite
Text
Lee et al. "Background Suppression Network for Weakly-Supervised Temporal Action Localization." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I07.6793Markdown
[Lee et al. "Background Suppression Network for Weakly-Supervised Temporal Action Localization." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/lee2020aaai-background/) doi:10.1609/AAAI.V34I07.6793BibTeX
@inproceedings{lee2020aaai-background,
title = {{Background Suppression Network for Weakly-Supervised Temporal Action Localization}},
author = {Lee, Pilhyeon and Uh, Youngjung and Byun, Hyeran},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2020},
pages = {11320-11327},
doi = {10.1609/AAAI.V34I07.6793},
url = {https://mlanthology.org/aaai/2020/lee2020aaai-background/}
}