VUDG: A Dataset for Video Understanding Domain Generalization

Wang, Ziyi; Gao, Zhi; Yu, Boxuan; Dai, Zirui; Wang, Peiyao; Song, Yuxiang; Lu, Qingyuan; Chen, Jin; Wu, Xinxiao

VUDG: A Dataset for Video Understanding Domain Generalization

Ziyi Wang, Zhi Gao, Boxuan Yu, Zirui Dai, Peiyao Wang, Yuxiang Song, Qingyuan Lu, Jin Chen, Xinxiao Wu

ICLR 2026

/iclr/2026/wang2026iclr-vudg/

Abstract

Video understanding has made remarkable progress in recent years, largely driven by advances in deep models and the availability of large-scale annotated datasets. However, the robustness of these models to domain shifts encountered in real-world video applications remains a critical yet underexplored problem, limiting their practical reliability. To address this problem, we introduce \textbf{V}ideo \textbf{U}nderstanding \textbf{D}omain \textbf{G}eneralization (\textbf{VUDG}), the first dataset designed specifically for evaluating domain generalization in video understanding. VUDG contains videos from 11 distinct domains that cover three types of domain shifts, and maintains semantic consistency across different domains to ensure fair and meaningful evaluation. We propose a multi-expert progressive annotation framework to efficiently annotate videos with structured question-answer pairs designed for domain generalization. Extensive experiments on 9 representative Large Vision-Language Models (LVLMs) and several traditional video question answering methods show that most models (including state-of-the-art LVLMs) suffer performance degradation under domain shifts. These results highlight the challenges posed by VUDG and the difference in the robustness of current models to data distribution shifts. We believe VUDG provides a critical resource to benefit future research in domain generalization for video understanding.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Wang et al. "VUDG: A Dataset for Video Understanding Domain Generalization." International Conference on Learning Representations, 2026.

Markdown

[Wang et al. "VUDG: A Dataset for Video Understanding Domain Generalization." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-vudg/)

BibTeX

@inproceedings{wang2026iclr-vudg,
  title     = {{VUDG: A Dataset for Video Understanding Domain Generalization}},
  author    = {Wang, Ziyi and Gao, Zhi and Yu, Boxuan and Dai, Zirui and Wang, Peiyao and Song, Yuxiang and Lu, Qingyuan and Chen, Jin and Wu, Xinxiao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/wang2026iclr-vudg/}
}