Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization
Abstract
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great progress, existing methods suffer from severe action-background ambiguity, which mainly comes from background noise introduced by aggregation operations and large intra-action variations caused by the task gap between classification and localization. To address this issue, we propose a generalized evidential deep learning (EDL) framework for WS-TAL, called Dual-Evidential Learning for Uncertainty modeling (DELU), which extends the traditional paradigm of EDL to adapt to the weakly-supervised multi-label classification goal. Specifically, targeting at adaptively excluding the undesirable background snippets, we utilize the video-level uncertainty to measure the interference of background noise to video-level prediction. Then, the snippet-level uncertainty is further induced for progressive learning, which gradually focuses on the entire action instances in an “easy-to-hard” manner. Extensive experiments show that DELU achieves state-of-the-art performance on THUMOS14 and ActivityNet1.2 benchmarks. Our code is available in github.com/MengyuanChen21/ECCV2022-DELU.
Cite
Text
Chen et al. "Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19772-7_12Markdown
[Chen et al. "Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/chen2022eccv-dualevidential/) doi:10.1007/978-3-031-19772-7_12BibTeX
@inproceedings{chen2022eccv-dualevidential,
title = {{Dual-Evidential Learning for Weakly-Supervised Temporal Action Localization}},
author = {Chen, Mengyuan and Gao, Junyu and Yang, Shicai and Xu, Changsheng},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2022},
doi = {10.1007/978-3-031-19772-7_12},
url = {https://mlanthology.org/eccv/2022/chen2022eccv-dualevidential/}
}