iWISDM: Assessing Instruction Following in Multimodal Models at Scale

Abstract

The ability to perform complex tasks from detailed instructions is a key to the remarkable achievements of our species. As humans, we are not only capable of performing a wide variety of tasks but also very complex ones that may entail hundreds or thousands of steps to complete. Large language models and their more recent multimodal counterparts that integrate textual and visual inputs have achieved unprecedented success in performing complex tasks. Yet, most existing benchmarks are largely confined to single-modality inputs — either text or vision — and thus, narrowing the scope of multimodal integration assessments, particularly for instruction-following in multimodal contexts. To bridge this gap, we introduce the instructed-Virtual VISual Decision Making (iWISDM) environment engineered to generate a limitless array of vision-language tasks of varying complexity. Using iWISDM, we compiled three distinct benchmarks of instruction following visual tasks across varying complexity levels and evaluated several newly developed multimodal models on these benchmarks. Our findings establish iWISDM as a robust benchmark for assessing the instructional adherence of both existing and emergent multimodal models and highlight a large gap in these models’ ability to precisely follow instructions.

Cite

Text

Lei et al. "iWISDM: Assessing Instruction Following in Multimodal Models at Scale." ICML 2024 Workshops: LLMs_and_Cognition, 2024.

Markdown

[Lei et al. "iWISDM: Assessing Instruction Following in Multimodal Models at Scale." ICML 2024 Workshops: LLMs_and_Cognition, 2024.](https://mlanthology.org/icmlw/2024/lei2024icmlw-iwisdm/)

BibTeX

@inproceedings{lei2024icmlw-iwisdm,
  title     = {{iWISDM: Assessing Instruction Following in Multimodal Models at Scale}},
  author    = {Lei, Xiaoxuan and Gomez, Lucas and Bai, Hao Yuan and Bashivan, Pouya},
  booktitle = {ICML 2024 Workshops: LLMs_and_Cognition},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/lei2024icmlw-iwisdm/}
}