Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models

Abstract

While Multi-modal Language Models (MLMs) demon strate impressive multimodal ability they still struggle on providing factual and precise responses for tasks like vi sual question answering (VQA). In this paper we address this challenge from the perspective of contextual informa tion. We propose Causal Context Generation Causal-CoG which is a prompting strategy that engages contextual infor mation to enhance precise VQA during inference. Specifi cally we prompt MLMs to generate contexts i.e text de scription of an image and engage the generated contexts for question answering. Moreover we investigate the ad vantage of contexts on VQA from a causality perspective introducing causality filtering to select samples for which contextual information is helpful. To show the effective ness of Causal-CoG we run extensive experiments on 10 multimodal benchmarks and showconsistent improvements e.g. +6.30% on POPE +13.69% on Vizwiz and +6.43% on VQAv2 compared to direct decoding surpassing exist ing methods. We hope Casual-CoG inspires explorations of context knowledge in multimodal models and serves as a plug-and-play strategy for MLM decoding.

Cite

Text

Zhao et al. "Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01267

Markdown

[Zhao et al. "Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhao2024cvpr-causalcog/) doi:10.1109/CVPR52733.2024.01267

BibTeX

@inproceedings{zhao2024cvpr-causalcog,
  title     = {{Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-Modal Language Models}},
  author    = {Zhao, Shitian and Li, Zhuowan and Lu, Yadong and Yuille, Alan and Wang, Yan},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {13342-13351},
  doi       = {10.1109/CVPR52733.2024.01267},
  url       = {https://mlanthology.org/cvpr/2024/zhao2024cvpr-causalcog/}
}