Building a Size Constrained Predictive Models for Video Classification
Abstract
Herein we present the solution to the $2^\mathrm{nd}$ YouTube-8M video understanding challenge which placed $1^\mathrm{st}$ . Competition participants were tasked with building a size constrained video labeling model with a model size of less than 1 GB. Our final solution consists of several submodels belonging to Fisher vectors, NetVlad, Deep Bag of Frames and Recurrent neural networks model families. To make the classifier efficient under size constraints we introduced model distillation, partial weights quantization and training with exponential moving average.
Cite
Text
Skalic and Austin. "Building a Size Constrained Predictive Models for Video Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_27Markdown
[Skalic and Austin. "Building a Size Constrained Predictive Models for Video Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/skalic2018eccvw-building/) doi:10.1007/978-3-030-11018-5_27BibTeX
@inproceedings{skalic2018eccvw-building,
title = {{Building a Size Constrained Predictive Models for Video Classification}},
author = {Skalic, Miha and Austin, David},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {297-305},
doi = {10.1007/978-3-030-11018-5_27},
url = {https://mlanthology.org/eccvw/2018/skalic2018eccvw-building/}
}