Enriching Local and Global Contexts for Temporal Action Localization
Abstract
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, i.e., fine-grained discrimination for temporal localization and sufficient visual invariance for action classification. We address this challenge by enriching both the local and global contexts in the popular two-stage temporal localization framework, where action proposals are first generated followed by action classification and temporal boundary regression. Our proposed model, dubbed ContextLoc, can be divided into three sub-networks: L-Net, G-Net and P-Net. L-Net enriches the local context via fine-grained modeling of snippet-level features, which is formulated as a query-and-retrieval process. G-Net enriches the global context via higher-level modeling of the video-level representation. In addition, we introduce a novel context adaptation module to adapt the global context to different proposals. P-Net further models the context-aware inter-proposal relations. We explore two existing models to be the P-Net in our experiments. The efficacy of our proposed method is validated by experimental results on the THUMOS14 (54.3% at [email protected]) and ActivityNet v1.3 (56.01% at [email protected]) datasets, which outperforms recent states of the art. Code is available at https://github.com/buxiangzhiren/ContextLoc.
Cite
Text
Zhu et al. "Enriching Local and Global Contexts for Temporal Action Localization." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01326Markdown
[Zhu et al. "Enriching Local and Global Contexts for Temporal Action Localization." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/zhu2021iccv-enriching/) doi:10.1109/ICCV48922.2021.01326BibTeX
@inproceedings{zhu2021iccv-enriching,
title = {{Enriching Local and Global Contexts for Temporal Action Localization}},
author = {Zhu, Zixin and Tang, Wei and Wang, Le and Zheng, Nanning and Hua, Gang},
booktitle = {International Conference on Computer Vision},
year = {2021},
pages = {13516-13525},
doi = {10.1109/ICCV48922.2021.01326},
url = {https://mlanthology.org/iccv/2021/zhu2021iccv-enriching/}
}