The 2nd YouTube-8m Large-Scale Video Understanding Challenge

Abstract

We hosted the 2nd YouTube-8M Large-Scale Video Understanding Kaggle Challenge and Workshop at ECCV’18, with the task of classifying videos from frame-level and video-level audio-visual features. In this year’s challenge, we restricted the final model size to 1 GB or less, encouraging participants to explore representation learning or better architecture, instead of heavy ensembles of multiple models. In this paper, we briefly introduce the YouTube-8M dataset and challenge task, followed by participants statistics and result analysis. We summarize proposed ideas by participants, including architectures, temporal aggregation methods, ensembling and distillation, data augmentation, and more.

Cite

Text

Lee et al. "The 2nd YouTube-8m Large-Scale Video Understanding Challenge." European Conference on Computer Vision Workshops, 2018. doi:10.1007/978-3-030-11018-5_18

Markdown

[Lee et al. "The 2nd YouTube-8m Large-Scale Video Understanding Challenge." European Conference on Computer Vision Workshops, 2018.](https://mlanthology.org/eccvw/2018/lee2018eccvw-2nd/) doi:10.1007/978-3-030-11018-5_18

BibTeX

@inproceedings{lee2018eccvw-2nd,
  title     = {{The 2nd YouTube-8m Large-Scale Video Understanding Challenge}},
  author    = {Lee, Joonseok and Natsev, Apostol and Reade, Walter and Sukthankar, Rahul and Toderici, George},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2018},
  pages     = {193-205},
  doi       = {10.1007/978-3-030-11018-5_18},
  url       = {https://mlanthology.org/eccvw/2018/lee2018eccvw-2nd/}
}