Temporal-Relational CrossTransformers for Few-Shot Action Recognition

Abstract

We propose a novel approach to few-shot action recognition, finding temporally-corresponding frame tuples between the query and videos in the support set. Distinct from previous few-shot works, we construct class prototypes using the CrossTransformer attention mechanism to observe relevant sub-sequences of all support videos, rather than using class averages or single best matches. Video representations are formed from ordered tuples of varying numbers of frames, which allows sub-sequences of actions at different speeds and temporal offsets to be compared. Our proposed Temporal-Relational CrossTransformers (TRX) achieve state-of-the-art results on few-shot splits of Kinetics, Something-Something V2 (SSv2), HMDB51 and UCF101. Importantly, our method outperforms prior work on SSv2 by a wide margin (12%) due to the its ability to model temporal relations. A detailed ablation showcases the importance of matching to multiple support set videos and learning higher-order relational CrossTransformers.

Cite

Text

Perrett et al. "Temporal-Relational CrossTransformers for Few-Shot Action Recognition." Conference on Computer Vision and Pattern Recognition, 2021. doi:10.1109/CVPR46437.2021.00054

Markdown

[Perrett et al. "Temporal-Relational CrossTransformers for Few-Shot Action Recognition." Conference on Computer Vision and Pattern Recognition, 2021.](https://mlanthology.org/cvpr/2021/perrett2021cvpr-temporalrelational/) doi:10.1109/CVPR46437.2021.00054

BibTeX

@inproceedings{perrett2021cvpr-temporalrelational,
  title     = {{Temporal-Relational CrossTransformers for Few-Shot Action Recognition}},
  author    = {Perrett, Toby and Masullo, Alessandro and Burghardt, Tilo and Mirmehdi, Majid and Damen, Dima},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2021},
  pages     = {475-484},
  doi       = {10.1109/CVPR46437.2021.00054},
  url       = {https://mlanthology.org/cvpr/2021/perrett2021cvpr-temporalrelational/}
}