Mitigating Lies in Vision-Language Models

Abstract

In this work, we bring new insights into the honesty of vision-language models, particularly in visual question answering (VQA). After a throughout revisit of the existing ‘lie’ behavior in pure language models, our work makes an unprecedented extension of ’lies’ to vision-language models. The results indicate that the lie prefixes have a more obvious misleading effect on vision-language models than on language models. We also propose a novel visual prefix and prove that the consistent vision-language prefix is more threatening to vision-language models. To defend the models from the stated ’lies’, we put forward an unsupervised framework based on Gaussian mixture modeling and obtain improvement with 3% against the language prefix and 12% against the vision-language prefix.

Cite

Text

Li et al. "Mitigating Lies in Vision-Language Models." NeurIPS 2022 Workshops: MLSW, 2022.

Markdown

[Li et al. "Mitigating Lies in Vision-Language Models." NeurIPS 2022 Workshops: MLSW, 2022.](https://mlanthology.org/neuripsw/2022/li2022neuripsw-mitigating/)

BibTeX

@inproceedings{li2022neuripsw-mitigating,
  title     = {{Mitigating Lies in Vision-Language Models}},
  author    = {Li, Junbo and Li, Xianhang and Xie, Cihang},
  booktitle = {NeurIPS 2022 Workshops: MLSW},
  year      = {2022},
  url       = {https://mlanthology.org/neuripsw/2022/li2022neuripsw-mitigating/}
}