Weakly Supervised Action Selection Learning in Video

Abstract

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame-level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically, we show that ASL outperforms leading baselines on two popular benchmarks THUMOS-14 and ActivityNet-1.2, with 10.3% and 5.7% relative improvement respectively. We further analyze the properties of ASL and demonstrate the importance of actionness. Full code for this work is available here https://github.com/layer6ai-labs/ASL

Cite

Text

Ma et al. "Weakly Supervised Action Selection Learning in Video." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00750

Markdown

[Ma et al. "Weakly Supervised Action Selection Learning in Video." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/ma2021cvpr-weakly/) doi:10.1109/CVPR46437.2021.00750

BibTeX

@inproceedings{ma2021cvpr-weakly,
  title     = {{Weakly Supervised Action Selection Learning in Video}},
  author    = {Ma, Junwei and Gorti, Satya Krishna and Volkovs, Maksims and Yu, Guangwei},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {7587-7596},
  doi       = {10.1109/CVPR46437.2021.00750},
  url       = {https://mlanthology.org/cvpr/2021/ma2021cvpr-weakly/}
}