What if the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models

Abstract

Counterfactual reasoning ability is one of the core abilities of human intelligence. This reasoning process involves the processing of alternatives to observed states or past events, and this process can improve our ability for planning and decision-making. In this work, we focus on benchmarking the counterfactual reasoning ability of multimodal large language models. We take the question and answer pairs from the VQAv2 dataset and add one counterfactual presupposition to the questions, with the answer being modified accordingly. After generating counterfactual questions and answers using ChatGPT, we manually examine all generated questions and answers to ensure correctness. This results in over 2k counterfactual question and answer pairs. We evaluate recent vision language models on our newly collected test dataset and found that all models exhibit a large performance drop compared to the results tested on questions without counterfactual presupposition. This result indicates that there still exists space for developing vision language models. We hope our proposed benchmark can help the development of future systems.

Cite

Text

Zhang et al. "What if the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models." IEEE/CVF International Conference on Computer Vision Workshops, 2023. doi:10.1109/ICCVW60793.2023.00497

Markdown

[Zhang et al. "What if the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models." IEEE/CVF International Conference on Computer Vision Workshops, 2023.](https://mlanthology.org/iccvw/2023/zhang2023iccvw-tv/) doi:10.1109/ICCVW60793.2023.00497

BibTeX

@inproceedings{zhang2023iccvw-tv,
  title     = {{What if the TV Was Off? Examining Counterfactual Reasoning Abilities of Multi-Modal Language Models}},
  author    = {Zhang, Letian and Zhai, Xiaotong and Zhao, Zhongkai and Wen, Xin and Zhao, Bingchen},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2023},
  pages     = {4631-4635},
  doi       = {10.1109/ICCVW60793.2023.00497},
  url       = {https://mlanthology.org/iccvw/2023/zhang2023iccvw-tv/}
}