Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval

Abstract

Recent progress in text–video retrieval has been largely driven by contrastive learning. However, existing methods often overlook the effect of the modality gap, which causes anchor representations to undergo in-place optimization (i.e., optimization tension) that limits their alignment capacity. Moreover, noisy hard negatives further distort the semantics of anchors. To address these issues, we propose GARE, a Gap-Aware Retrieval framework that introduces a learnable, pair-specific increment $\Delta_{ij}$ between text $t_i$ and video $v_j$, redistributing gradients to relieve optimization tension and absorb noise. We derive $\Delta_{ij}$ via a multivariate first-order Taylor expansion of the InfoNCE loss under a trust-region constraint, showing that it guides updates along locally consistent descent directions. A lightweight neural module conditioned on the semantic gap couples increments across batches for structure-aware correction. Furthermore, we regularize $\Delta$ through a variational information bottleneck with relaxed compression, enhancing stability and semantic consistency. Experiments on four benchmarks demonstrate that GARE consistently improves alignment accuracy and robustness, validating the effectiveness of gap-aware tension mitigation.

Cite

Text

Xiao et al. "Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval." Advances in Neural Information Processing Systems, 2025.

Markdown

[Xiao et al. "Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/xiao2025neurips-rebalancing/)

BibTeX

@inproceedings{xiao2025neurips-rebalancing,
  title     = {{Rebalancing Contrastive Alignment with Bottlenecked Semantic Increments in Text-Video Retrieval}},
  author    = {Xiao, Jian and Song, Zijie and Hu, Jialong and Cheng, Hao and Hu, Zhenzhen and Li, Jia and Hong, Richang},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/xiao2025neurips-rebalancing/}
}