Building a Size Constrained Predictive Models for Video Classification

Abstract

Herein we present the solution to the $2^\mathrm{nd}$ YouTube-8M video understanding challenge which placed $1^\mathrm{st}$ . Competition participants were tasked with building a size constrained video labeling model with a model size of less than 1 GB. Our final solution consists of several submodels belonging to Fisher vectors, NetVlad, Deep Bag of Frames and Recurrent neural networks model families. To make the classifier efficient under size constraints we introduced model distillation, partial weights quantization and training with exponential moving average.

Cite

Text

Skalic and Austin. "Building a Size Constrained Predictive Models for Video Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_27

Markdown

[Skalic and Austin. "Building a Size Constrained Predictive Models for Video Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/skalic2018eccvw-building/) doi:10.1007/978-3-030-11018-5_27

BibTeX

@inproceedings{skalic2018eccvw-building,
  title     = {{Building a Size Constrained Predictive Models for Video Classification}},
  author    = {Skalic, Miha and Austin, David},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {297-305},
  doi       = {10.1007/978-3-030-11018-5_27},
  url       = {https://mlanthology.org/eccvw/2018/skalic2018eccvw-building/}
}