Robust Test-Time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Zhang, Bingqing; Cao, Zhuo; Du, Heming; Li, Yang; Li, Xue; Liu, Jiajun; Wang, Sen

Robust Test-Time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts

Bingqing Zhang, Zhuo Cao, Heming Du, Yang Li, Xue Li, Jiajun Liu, Sen Wang

ICLR 2026

/iclr/2026/zhang2026iclr-robust-a/

Abstract

Modern video-text retrieval (VTR) models excel on in-distribution benchmarks but are highly vulnerable to real-world *query shifts*, where the distribution of query data deviates from the training domain, leading to a sharp performance drop. Existing image-focused robustness solutions are inadequate to handle this vulnerability in video, as they fail to address the complex spatio-temporal dynamics inherent in these shifts. To systematically evaluate this vulnerability, we first introduce a comprehensive benchmark featuring 12 distinct types of video perturbations across five severity degrees. Analysis on this benchmark reveals that query shifts amplify the *hubness phenomenon*, where a few gallery items become dominant "hubs" that attract a disproportionate number of queries. To mitigate this, we then propose HAT-VTR (Hubness Alleviation for Test-time Video-Text Retrieval), as our baseline test-time adaptation framework designed to directly counteract hubness in VTR. It leverages two key components: a *Hubness Suppression Memory* to refine similarity scores, and *multi-granular losses* to enforce temporal feature consistency. Extensive experiments demonstrate that HAT-VTR substantially improves robustness, consistently outperforming prior methods across diverse query shift scenarios, and enhancing model reliability for real-world applications.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Zhang et al. "Robust Test-Time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts." International Conference on Learning Representations, 2026.

Markdown

[Zhang et al. "Robust Test-Time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-robust-a/)

BibTeX

@inproceedings{zhang2026iclr-robust-a,
  title     = {{Robust Test-Time Video-Text Retrieval: Benchmarking and Adapting for Query Shifts}},
  author    = {Zhang, Bingqing and Cao, Zhuo and Du, Heming and Li, Yang and Li, Xue and Liu, Jiajun and Wang, Sen},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhang2026iclr-robust-a/}
}