ViLCo-Bench: VIdeo Language COntinual Learning Benchmark

Abstract

Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model’s ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github.com/cruiseresearchgroup/ViLCo.

Cite

Text

Tang et al. "ViLCo-Bench: VIdeo Language COntinual Learning Benchmark." Neural Information Processing Systems, 2024. doi:10.52202/079017-2244

Markdown

[Tang et al. "ViLCo-Bench: VIdeo Language COntinual Learning Benchmark." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/tang2024neurips-vilcobench/) doi:10.52202/079017-2244

BibTeX

@inproceedings{tang2024neurips-vilcobench,
  title     = {{ViLCo-Bench: VIdeo Language COntinual Learning Benchmark}},
  author    = {Tang, Tianqi and Deldari, Shohreh and Xue, Hao and De Melo, Celso and Salim, Flora},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2244},
  url       = {https://mlanthology.org/neurips/2024/tang2024neurips-vilcobench/}
}