Use Your Head: Improving Long-Tail Video Recognition

Abstract

This paper presents an investigation into long-tail video recognition. We demonstrate that, unlike naturally-collected video datasets and existing long-tail image benchmarks, current video benchmarks fall short on multiple long-tailed properties. Most critically, they lack few-shot classes in their tails. In response, we propose new video benchmarks that better assess long-tail recognition, by sampling subsets from two datasets: SSv2 and VideoLT. We then propose a method, Long-Tail Mixed Reconstruction (LMR), which reduces overfitting to instances from few-shot classes by reconstructing them as weighted combinations of samples from head classes. LMR then employs label mixing to learn robust decision boundaries. It achieves state-of-the-art average class accuracy on EPIC-KITCHENS and the proposed SSv2-LT and VideoLT-LT. Benchmarks and code at: github.com/tobyperrett/lmr

Cite

Text

Perrett et al. "Use Your Head: Improving Long-Tail Video Recognition." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00239

Markdown

[Perrett et al. "Use Your Head: Improving Long-Tail Video Recognition." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/perrett2023cvpr-use/) doi:10.1109/CVPR52729.2023.00239

BibTeX

@inproceedings{perrett2023cvpr-use,
  title     = {{Use Your Head: Improving Long-Tail Video Recognition}},
  author    = {Perrett, Toby and Sinha, Saptarshi and Burghardt, Tilo and Mirmehdi, Majid and Damen, Dima},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {2415-2425},
  doi       = {10.1109/CVPR52729.2023.00239},
  url       = {https://mlanthology.org/cvpr/2023/perrett2023cvpr-use/}
}