Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications

Abstract

Trained on large datasets, deep learning (DL) can accurately classify videos into hundreds of diverse classes. However, video data is expensive to annotate. Zero-shot learning (ZSL) proposes one solution to this problem. ZSL trains a model once, and generalizes to new tasks whose classes are not present in the training dataset. We propose the first end-to-end algorithm for ZSL in video classification. Our training procedure builds on insights from recent video classification literature and uses a trainable 3D CNN to learn the visual features. This is in contrast to previous video ZSL methods, which use pretrained feature extractors. We also extend the current benchmarking paradigm: Previous techniques aim to make the test task unknown at training time but fall short of this goal. We encourage domain shift across training and test data and disallow tailoring a ZSL model to a specific test dataset. We outperform the state-of-the-art by a wide margin. Our code, evaluation procedure and model weights are available online github.com/bbrattoli/ZeroShotVideoClassification.

Cite

Text

Brattoli et al. "Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020. doi:10.1109/CVPR42600.2020.00467

Markdown

[Brattoli et al. "Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020.](https://mlanthology.org/cvpr/2020/brattoli2020cvpr-rethinking/) doi:10.1109/CVPR42600.2020.00467

BibTeX

@inproceedings{brattoli2020cvpr-rethinking,
  title     = {{Rethinking Zero-Shot Video Classification: End-to-End Training for Realistic Applications}},
  author    = {Brattoli, Biagio and Tighe, Joseph and Zhdanov, Fedor and Perona, Pietro and Chalupka, Krzysztof},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {2020},
  doi       = {10.1109/CVPR42600.2020.00467},
  url       = {https://mlanthology.org/cvpr/2020/brattoli2020cvpr-rethinking/}
}