Learning Video Features for Multi-Label Classification
Abstract
This paper studies some approaches to learn representation of videos. This work was done as a part of Youtube-8M Video Understanding Challenge. The main focus is to analyze various approaches used to model temporal data and evaluate the performance of such approaches on this problem. Also, a model is proposed which reduces the size of feature vector by 70% but does not compromise on accuracy. The first approach is to use recurrent neural network architectures to learn a single video level feature from frame level features and then use this aggregated feature to do multi-label classification. The second approach is to use video level features and deep neural networks to assign the labels.
Cite
Text
Garg. "Learning Video Features for Multi-Label Classification." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_30Markdown
[Garg. "Learning Video Features for Multi-Label Classification." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/garg2018eccvw-learning/) doi:10.1007/978-3-030-11018-5_30BibTeX
@inproceedings{garg2018eccvw-learning,
title = {{Learning Video Features for Multi-Label Classification}},
author = {Garg, Shivam},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {325-337},
doi = {10.1007/978-3-030-11018-5_30},
url = {https://mlanthology.org/eccvw/2018/garg2018eccvw-learning/}
}