Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing

Abstract

Language bias in Visual Question Answering (VQA) arises when models exploit spurious statistical correlations between question templates and answers, particularly in out-of-distribution scenarios, thereby neglecting essential visual cues and compromising genuine multimodal reasoning. Despite numerous efforts to enhance the robustness of VQA models, a principled understanding of how such bias originates and influences model behavior remains underdeveloped. In this paper, we address this gap through a comprehensive empirical and theoretical analysis, revealing that modality-specific gradient imbalances, which originate from the inherent heterogeneity of multimodal data, lead to skewed feature fusion and biased classifier weights. To alleviate these issues, we propose a novel Multi-Margin Collaborative Debiasing (MMCD) framework that adaptively integrates frequency-, confidence-, and difficulty-aware angular margins with a dynamic difficulty-aware contrastive learning mechanism, to dynamically reshape decision boundaries. Extensive experiments across multiple challenging VQA benchmarks confirm the consistent superiority of our proposed MMCD over state-of-the-art baselines in combating language bias.

Cite

Text

Zhu et al. "Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhu et al. "Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhu2025neurips-languagebiasresilient/)

BibTeX

@inproceedings{zhu2025neurips-languagebiasresilient,
  title     = {{Language‑Bias‑Resilient Visual Question Answering via Adaptive Multi‑Margin Collaborative Debiasing}},
  author    = {Zhu, Huanjia and Zheng, Shuyuan and Liu, Yishu and Cai, Sudong and Chen, Bingzhi},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhu2025neurips-languagebiasresilient/}
}