VideoLT: Large-Scale Long-Tailed Video Recognition

Zhang, Xing; Wu, Zuxuan; Weng, Zejia; Fu, Huazhu; Chen, Jingjing; Jiang, Yu-Gang; Davis, Larry S.

doi:10.1109/ICCV48922.2021.00786

VideoLT: Large-Scale Long-Tailed Video Recognition

Xing Zhang, Zuxuan Wu, Zejia Weng, Huazhu Fu, Jingjing Chen, Yu-Gang Jiang, Larry S. Davis

ICCV 2021 pp. 7960-7969

doi:10.1109/ICCV48922.2021.00786 /iccv/2021/zhang2021iccv-videolt/

Abstract

Label distributions in real-world are oftentimes long-tailed and imbalanced, resulting in biased models towards dominant labels. While long-tailed recognition has been extensively studied for image classification tasks, limited effort has been made for video domain. In this paper, we introduce VideoLT, a large-scale long-tailed video recognition dataset, as a step toward real-world video recognition. VideoLT contains 256,218 untrimmed videos, annotated into 1,004 classes with a long-tailed distribution. Through extensive studies, we demonstrate that state-of-the-art methods used for long-tailed image recognition do not perform well in the video domain due to the additional temporal dimension in video data. This motivates us to propose FrameStack, a simple yet effective method for long-tailed video recognition task. In particular, FrameStack performs sampling at the frame-level in order to balance class distributions, and the sampling ratio is dynamically determined using knowledge derived from the network during training. Experimental results demonstrate that FrameStack can improve classification performance without sacrificing overall accuracy. Code and dataset are available at: https://github.com/17Skye17/VideoLT.

PDF ICCV Semantic Scholar

Cite

Text

Zhang et al. "VideoLT: Large-Scale Long-Tailed Video Recognition." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00786

Markdown

[Zhang et al. "VideoLT: Large-Scale Long-Tailed Video Recognition." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/zhang2021iccv-videolt/) doi:10.1109/ICCV48922.2021.00786

BibTeX

@inproceedings{zhang2021iccv-videolt,
  title     = {{VideoLT: Large-Scale Long-Tailed Video Recognition}},
  author    = {Zhang, Xing and Wu, Zuxuan and Weng, Zejia and Fu, Huazhu and Chen, Jingjing and Jiang, Yu-Gang and Davis, Larry S.},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {7960-7969},
  doi       = {10.1109/ICCV48922.2021.00786},
  url       = {https://mlanthology.org/iccv/2021/zhang2021iccv-videolt/}
}