Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation

Purwanto, Didik; Pramono, Rizard Renanda Adhi; Chen, Yie-Tarng; Fang, Wen-Hsien

doi:10.1109/ICCVW.2019.00125

Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation

Didik Purwanto, Rizard Renanda Adhi Pramono, Yie-Tarng Chen, Wen-Hsien Fang

ICCVW 2019 pp. 961-969

doi:10.1109/ICCVW.2019.00125 /iccvw/2019/purwanto2019iccvw-extreme/

Abstract

This paper proposes a two-stream network with a novel spatial-temporal multi-head self-attention mechanism for action recognition in extreme low resolution (LR) videos. The new approach first utilizes a super resolution (SR) mechanism to provide better visual information to facilitate the network training. To provide more discriminative spatio-temporal features, a knowledge distillation scheme that consists of teacher and student models is employed to enhance the network model using the knowledge from a high resolution (HR) model. Moreover, the two-stream network is combined with a new spatial-temporal multi-head self-attention network to efficaciously learn the long-term temporal dependency. Simulations demonstrate that the proposed method surpasses the state-of-the-art works for extreme LR action recognition on two widespread HMDB-51 and IXMAS datasets.

ICCVW Semantic Scholar

Cite

Text

Purwanto et al. "Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00125

Markdown

[Purwanto et al. "Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/purwanto2019iccvw-extreme/) doi:10.1109/ICCVW.2019.00125

BibTeX

@inproceedings{purwanto2019iccvw-extreme,
  title     = {{Extreme Low Resolution Action Recognition with Spatial-Temporal Multi-Head Self-Attention and Knowledge Distillation}},
  author    = {Purwanto, Didik and Pramono, Rizard Renanda Adhi and Chen, Yie-Tarng and Fang, Wen-Hsien},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {961-969},
  doi       = {10.1109/ICCVW.2019.00125},
  url       = {https://mlanthology.org/iccvw/2019/purwanto2019iccvw-extreme/}
}