Learning Video Features for Multi-Label Classification

Abstract

This paper studies some approaches to learn representation of videos. This work was done as a part of Youtube-8M Video Understanding Challenge. The main focus is to analyze various approaches used to model temporal data and evaluate the performance of such approaches on this problem. Also, a model is proposed which reduces the size of feature vector by 70% but does not compromise on accuracy. The first approach is to use recurrent neural network architectures to learn a single video level feature from frame level features and then use this aggregated feature to do multi-label classification. The second approach is to use video level features and deep neural networks to assign the labels.

Cite

Text

Garg. "Learning Video Features for Multi-Label Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_30

Markdown

[Garg. "Learning Video Features for Multi-Label Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/garg2018eccvw-learning/) doi:10.1007/978-3-030-11018-5_30

BibTeX

@inproceedings{garg2018eccvw-learning,
  title     = {{Learning Video Features for Multi-Label Classification}},
  author    = {Garg, Shivam},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {325-337},
  doi       = {10.1007/978-3-030-11018-5_30},
  url       = {https://mlanthology.org/eccvw/2018/garg2018eccvw-learning/}
}