Hierarchical Self-Attention Network for Action Localization in Videos

Abstract

This paper presents a novel Hierarchical Self-Attention Network (HISAN) to generate spatial-temporal tubes for action localization in videos. The essence of HISAN is to combine the two-stream convolutional neural network (CNN) with hierarchical bidirectional self-attention mechanism, which comprises of two levels of bidirectional self-attention to efficaciously capture both of the long-term temporal dependency information and spatial context information to render more precise action localization. Also, a sequence rescoring (SR) algorithm is employed to resolve the dilemma of inconsistent detection scores incurred by occlusion or background clutter. Moreover, a new fusion scheme is invoked, which integrates not only the appearance and motion information from the two-stream network, but also the motion saliency to mitigate the effect of camera motion. Simulations reveal that the new approach achieves competitive performance as the state-of-the-art works in terms of action localization and recognition accuracy on the widespread UCF101-24 and J-HMDB datasets.

Cite

Text

Pramono et al. "Hierarchical Self-Attention Network for Action Localization in Videos." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00015

Markdown

[Pramono et al. "Hierarchical Self-Attention Network for Action Localization in Videos." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/pramono2019iccv-hierarchical/) doi:10.1109/ICCV.2019.00015

BibTeX

@inproceedings{pramono2019iccv-hierarchical,
  title     = {{Hierarchical Self-Attention Network for Action Localization in Videos}},
  author    = {Pramono, Rizard Renanda Adhi and Chen, Yie-Tarng and Fang, Wen-Hsien},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00015},
  url       = {https://mlanthology.org/iccv/2019/pramono2019iccv-hierarchical/}
}