Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions

Jin Gao, Lei Gan, Yuankai Li, Yixin Ye, Dequan Wang

ECCV 2024

doi:10.1007/978-3-031-72998-0_23 /eccv/2024/gao2024eccv-dissecting/

Abstract

Large multimodal models (LMMs) excel in adhering to human instructions. However, self-contradictory instructions may arise due to the increasing trend of multimodal interaction and context length, which is challenging for language beginners and vulnerable populations. We introduce the Self-Contradictory Instructions benchmark to evaluate the capability of LMMs in recognizing conflicting commands. It comprises 20,000 conflicts, evenly distributed between language and vision paradigms. It is constructed by a novel automatic dataset creation framework, which expedites the process and enables us to encompass a wide range of instruction forms. Our comprehensive evaluation reveals current LMMs consistently struggle to identify multimodal instruction discordance due to a lack of self-awareness. Hence, we propose the Cognitive Awakening Prompting to inject cognition from external, largely enhancing dissonance detection. Here are our website, dataset, and code.

PDF ECCV Semantic Scholar

Cite

Text

Gao et al. "Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72998-0_23

Markdown

[Gao et al. "Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/gao2024eccv-dissecting/) doi:10.1007/978-3-031-72998-0_23

BibTeX

@inproceedings{gao2024eccv-dissecting,
  title     = {{Dissecting Dissonance: Benchmarking Large Multimodal Models Against Self-Contradictory Instructions}},
  author    = {Gao, Jin and Gan, Lei and Li, Yuankai and Ye, Yixin and Wang, Dequan},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72998-0_23},
  url       = {https://mlanthology.org/eccv/2024/gao2024eccv-dissecting/}
}