Enhancing Temporal Action Localization with Transfer Learning from Action Recognition
Abstract
Temporal localization of actions in videos has been of increasing interest in recent years. However, most existing approaches rely on complex architectures that are either expensive to train, inefficient at inference time, or require thorough and careful architecture engineering. Classical action recognition on pre-segmented clips, on the other hand, benefits from sophisticated deep architectures that paved the way for highly reliable video clip classifiers. In this paper, we propose to use transfer learning to leverage the good results from action recognition for temporal localization. We apply a network that is inspired by the classical bag-of-words model for transfer learning and show that the resulting framewise class posteriors already provide good results without explicit temporal modeling. Further, we show that combining these features with a deep but simple convolutional network achieves state of the art results on two challenging action localization datasets.
Cite
Text
Iqbal et al. "Enhancing Temporal Action Localization with Transfer Learning from Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00191Markdown
[Iqbal et al. "Enhancing Temporal Action Localization with Transfer Learning from Action Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/iqbal2019iccvw-enhancing/) doi:10.1109/ICCVW.2019.00191BibTeX
@inproceedings{iqbal2019iccvw-enhancing,
title = {{Enhancing Temporal Action Localization with Transfer Learning from Action Recognition}},
author = {Iqbal, Ahsan and Richard, Alexander and Gall, Juergen},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {1533-1540},
doi = {10.1109/ICCVW.2019.00191},
url = {https://mlanthology.org/iccvw/2019/iqbal2019iccvw-enhancing/}
}