Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Abstract

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. Here are our website, dataset, and code.

Cite

Text

Gao et al. "Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72998-0_23

Markdown

[Gao et al. "Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/gao2024eccv-dissecting/) doi:10.1007/978-3-031-72998-0_23

BibTeX

@inproceedings{gao2024eccv-dissecting,
  title     = {{Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions}},
  author    = {Gao, Jin and Gan, Lei and Li, Yuankai and Ye, Yixin and Wang, Dequan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72998-0_23},
  url       = {https://mlanthology.org/eccv/2024/gao2024eccv-dissecting/}
}