Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

Moon, WonJun; Seong, Hyun Seok; Heo, Jae-Pil

doi:10.1609/AAAI.V37I2.25284

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

WonJun Moon, Hyun Seok Seong, Jae-Pil Heo

AAAI 2023 pp. 1931-1939

doi:10.1609/AAAI.V37I2.25284 /aaai/2023/moon2023aaai-minority/

Abstract

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In this work, we summarize the challenges in VLTR and explore how to overcome them. The challenges are: (1) it is impractical to re-train the whole model for high-quality features, (2) acquiring frame-wise labels requires extensive cost, and (3) long-tailed data triggers biased training. Yet, most existing works for VLTR unavoidably utilize image-level features extracted from pretrained models which are task-irrelevant, and learn by video-level labels. Therefore, to deal with such (1) task-irrelevant features and (2) video-level labels, we introduce two complementary learnable feature aggregators. Learnable layers in each aggregator are to produce task-relevant representations, and each aggregator is to assemble the snippet-wise knowledge into a video representative. Then, we propose Minority-Oriented Vicinity Expansion (MOVE) that explicitly leverages the class frequency into approximating the vicinity distributions to alleviate (3) biased training. By combining these solutions, our approach achieves state-of-the-art results on large-scale VideoLT and synthetically induced Imbalanced-MiniKinetics200. With VideoLT features from ResNet-50, it attains 18% and 58% relative improvements on head and tail classes over the previous state-of-the-art method, respectively. Code and dataset are available at https://github.com/wjun0830/MOVE.

PDF AAAI Semantic Scholar

Cite

Text

Moon et al. "Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I2.25284

Markdown

[Moon et al. "Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/moon2023aaai-minority/) doi:10.1609/AAAI.V37I2.25284

BibTeX

@inproceedings{moon2023aaai-minority,
  title     = {{Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition}},
  author    = {Moon, WonJun and Seong, Hyun Seok and Heo, Jae-Pil},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {1931-1939},
  doi       = {10.1609/AAAI.V37I2.25284},
  url       = {https://mlanthology.org/aaai/2023/moon2023aaai-minority/}
}