Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Abstract

Video Grounding (VG) aims to locate the desired segment from a video given a sentence query. Recent studies have found that current VG models are prone to over-rely the groundtruth moment annotation distribution biases in the training set. To discourage the standard VG model's behavior of exploiting such temporal annotation biases and improve the model generalization ability, we propose multiple negative augmentations in a hierarchical way, including cross-video augmentations from clip-/video-level, and self-shuffled augmentations with masks. These augmentations can effectively diversify the data distribution so that the model can make more reasonable predictions instead of merely fitting the temporal biases. However, directly adopting such data augmentation strategy may inevitably carry some noise shown in our cases, since not all of the handcrafted augmentations are semantically irrelevant to the groundtruth video. To further denoise and improve the grounding accuracy, we design a multi-stage curriculum strategy to adaptively train the standard VG model from easy to hard negative augmentations. Experiments on newly collected Charades-CD and ActivityNet-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios.

Cite

Text

Lan et al. "Curriculum Multi-Negative Augmentation for Debiased Video Grounding." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I1.25204

Markdown

[Lan et al. "Curriculum Multi-Negative Augmentation for Debiased Video Grounding." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/lan2023aaai-curriculum/) doi:10.1609/AAAI.V37I1.25204

BibTeX

@inproceedings{lan2023aaai-curriculum,
  title     = {{Curriculum Multi-Negative Augmentation for Debiased Video Grounding}},
  author    = {Lan, Xiaohan and Yuan, Yitian and Chen, Hong and Wang, Xin and Jie, Zequn and Ma, Lin and Wang, Zhi and Zhu, Wenwu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {1213-1221},
  doi       = {10.1609/AAAI.V37I1.25204},
  url       = {https://mlanthology.org/aaai/2023/lan2023aaai-curriculum/}
}