Few-Shot Action Recognition with Permutation-Invariant Attention

Zhang, Hongguang; Zhang, Li; Qi, Xiaojuan; Li, Hongdong; Torr, Philip H. S.; Koniusz, Piotr

doi:10.1007/978-3-030-58558-7_31

Few-Shot Action Recognition with Permutation-Invariant Attention

Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz

ECCV 2020

doi:10.1007/978-3-030-58558-7_31 /eccv/2020/zhang2020eccv-fewshot/

Abstract

Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos. We build on a C3D encoder for spatio-temporal video blocks to capture short-range action patterns. Such encoded blocks are aggregated by permutation-invariant pooling to make our approach robust to varying action lengths and long-range temporal dependencies whose patterns are unlikely to repeat even in clips of the same class. Subsequently, the pooled representations are combined into simple relation descriptors which encode so-called query and support clips. Finally, relation descriptors are fed to the comparator with the goal of similarity learning between query and support clips. Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision. In naturalistic clips (of the same class) there exists a temporal distribution shift--the locations of discriminative temporal action hotspots vary. Thus, we permute blocks of a clip and align the resulting attention regions with similarly permuted attention regions of non-permuted clip to train the attention mechanism invariant to block (and thus long-term hotspot) permutations. Our method outperforms the state of the art on the HMDB51, UCF101, miniMIT datasets.

PDF ECCV Semantic Scholar

Cite

Text

Zhang et al. "Few-Shot Action Recognition with Permutation-Invariant Attention." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58558-7_31

Markdown

[Zhang et al. "Few-Shot Action Recognition with Permutation-Invariant Attention." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/zhang2020eccv-fewshot/) doi:10.1007/978-3-030-58558-7_31

BibTeX

@inproceedings{zhang2020eccv-fewshot,
  title     = {{Few-Shot Action Recognition with Permutation-Invariant Attention}},
  author    = {Zhang, Hongguang and Zhang, Li and Qi, Xiaojuan and Li, Hongdong and Torr, Philip H. S. and Koniusz, Piotr},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58558-7_31},
  url       = {https://mlanthology.org/eccv/2020/zhang2020eccv-fewshot/}
}