CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization

Abstract

Temporal action localization has been one of the most popular tasks in video understanding, due to the importance of detecting action instances in videos. However, not much progress has been made on extending it to work in an online fashion, although many video related tasks can benefit by going online with the growing video streaming services. To this end, we introduce a new task called Online Temporal Action Localization (On-TAL), in which the goal is to immediately detect action instances from an untrimmed streaming video. The online setting makes the new task very challenging as the actionness decision for every frame has to be made without access to future frames and also because post-processing methods cannot be used to modify past action proposals. We propose a novel framework, Context-Aware Actionness Grouping (CAG) as a solution for On-TAL and train it with the imitation learning algorithm, which allows us to avoid sophisticated reward engineering. Evaluation of our work on THUMOS14 and Activitynet1.3 shows significant improvement over non-naive baselines, demonstrating the effectiveness of our approach. As a by-product, our method can also be used for the Online Detection of Action Start (ODAS), in which our method also outperforms previous state-of-the-art models.

Cite

Text

Kang et al. "CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01347

Markdown

[Kang et al. "CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/kang2021iccv-cagqil/) doi:10.1109/ICCV48922.2021.01347

BibTeX

@inproceedings{kang2021iccv-cagqil,
  title     = {{CAG-QIL: Context-Aware Actionness Grouping via Q Imitation Learning for Online Temporal Action Localization}},
  author    = {Kang, Hyolim and Kim, Kyungmin and Ko, Yumin and Kim, Seon Joo},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {13729-13738},
  doi       = {10.1109/ICCV48922.2021.01347},
  url       = {https://mlanthology.org/iccv/2021/kang2021iccv-cagqil/}
}