Generalized Many-Way Few-Shot Video Classification

Abstract

Few-shot learning methods operate in low data regimes. The aim is to learn with few training examples per class. Although significant progress has been made in few-shot image classification, few-shot video recognition is relatively unexplored and methods based on 2D CNNs are unable to learn temporal information. In this work we thus develop a simple 3D CNN baseline, surpassing existing methods by a large margin. To circumvent the need of labeled examples, we propose to leverage weakly-labeled videos from a large dataset using tag retrieval followed by selecting the best clips with visual similarities, yielding further improvement. Our results saturate current 5-way benchmarks for few-shot video classification and therefore we propose a new challenging benchmark involving more classes and a mixture of classes with varying supervision.

Cite

Text

Xian et al. "Generalized Many-Way Few-Shot Video Classification." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-65414-6_10

Markdown

[Xian et al. "Generalized Many-Way Few-Shot Video Classification." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/xian2020eccvw-generalized/) doi:10.1007/978-3-030-65414-6_10

BibTeX

@inproceedings{xian2020eccvw-generalized,
  title     = {{Generalized Many-Way Few-Shot Video Classification}},
  author    = {Xian, Yongqin and Korbar, Bruno and Douze, Matthijs and Schiele, Bernt and Akata, Zeynep and Torresani, Lorenzo},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {111-127},
  doi       = {10.1007/978-3-030-65414-6_10},
  url       = {https://mlanthology.org/eccvw/2020/xian2020eccvw-generalized/}
}