Iterative Theory of Mind Assay of Multimodal AI Models

Abstract

The concept of artificial general intelligence (AGI) has sparked intense debates across various sectors, fueled by the capabilities of Large Language Model-based AI systems like ChatGPT. However, the AI community remains divided on whether such models truly understand language and its contexts. Developing multimodal AI systems, which can engage with the user in multiple input and output modalities, is seen as a crucial step towards AGI. We employ a novel iterated Theory of Mind (iToM) test to reveal limitations of current multimodal LLMs like ChatGPT 4o in converging to coherent and unified internal world models which results in illogical and inconsistent user interactions both within and across the different input and output modalities. We also identify new multimodal confabulations ("hallucinations"), particularly in languages with less training data, such as Bengali.

Cite

Text

Das et al. "Iterative Theory of Mind Assay of Multimodal AI Models." ICML 2024 Workshops: LLMs_and_Cognition, 2024.

Markdown

[Das et al. "Iterative Theory of Mind Assay of Multimodal AI Models." ICML 2024 Workshops: LLMs_and_Cognition, 2024.](https://mlanthology.org/icmlw/2024/das2024icmlw-iterative/)

BibTeX

@inproceedings{das2024icmlw-iterative,
  title     = {{Iterative Theory of Mind Assay of Multimodal AI Models}},
  author    = {Das, Rohini Elora and Das, Rajarshi and Maity, Niharika and Das, Sreerupa},
  booktitle = {ICML 2024 Workshops: LLMs_and_Cognition},
  year      = {2024},
  url       = {https://mlanthology.org/icmlw/2024/das2024icmlw-iterative/}
}