The 2nd YouTube-8m Large-Scale Video Understanding Challenge
Abstract
We hosted the 2nd YouTube-8M Large-Scale Video Understanding Kaggle Challenge and Workshop at ECCV’18, with the task of classifying videos from frame-level and video-level audio-visual features. In this year’s challenge, we restricted the final model size to 1 GB or less, encouraging participants to explore representation learning or better architecture, instead of heavy ensembles of multiple models. In this paper, we briefly introduce the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis. We summarize proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.
Cite
Text
Lee et al. "The 2nd YouTube-8m Large-Scale Video Understanding Challenge." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_18Markdown
[Lee et al. "The 2nd YouTube-8m Large-Scale Video Understanding Challenge." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/lee2018eccvw-2nd/) doi:10.1007/978-3-030-11018-5_18BibTeX
@inproceedings{lee2018eccvw-2nd,
title = {{The 2nd YouTube-8m Large-Scale Video Understanding Challenge}},
author = {Lee, Joonseok and Natsev, Apostol and Reade, Walter and Sukthankar, Rahul and Toderici, George},
booktitle = {European Conference on Computer Vision Workshops},
year = {2018},
pages = {193-205},
doi = {10.1007/978-3-030-11018-5_18},
url = {https://mlanthology.org/eccvw/2018/lee2018eccvw-2nd/}
}