Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Wu, Wenshan; Mao, Shaoguang; Zhang, Yadong; Xia, Yan; Dong, Li; Cui, Lei; Wei, Furu

doi:10.52202/079017-2866

Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models

Wenshan Wu, Shaoguang Mao, Yadong Zhang, Yan Xia, Li Dong, Lei Cui, Furu Wei

NeurIPS 2024

doi:10.52202/079017-2866 /neurips/2024/wu2024neurips-mind/

Abstract

Large language models (LLMs) have exhibited impressive performance in language comprehension and various reasoning tasks. However, their abilities in spatial reasoning, a crucial aspect of human cognition, remain relatively unexplored. Human possess a remarkable ability to create mental images of unseen objects and actions through a process known as the Mind's Eye, enabling the imagination of the unseen world. Inspired by this cognitive capacity, we propose Visualization-of-Thought (VoT) prompting. VoT aims to elicit spatial reasoning of LLMs by visualizing their reasoning traces, thereby guiding subsequent reasoning steps. We employed VoT for multi-hop spatial reasoning tasks, including natural language navigation, visual navigation, and visual tiling in 2D grid worlds. Experimental results demonstrated that VoT significantly enhances the spatial reasoning abilities of LLMs. Notably, VoT outperformed existing multimodal large language models (MLLMs) in these tasks. While VoT works surprisingly well on LLMs, the ability to generate mental images to facilitate spatial reasoning resembles the mind's eye process, suggesting its potential viability in MLLMs. Please find the dataset and codes in our project page.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wu et al. "Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models." Neural Information Processing Systems, 2024. doi:10.52202/079017-2866

Markdown

[Wu et al. "Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/wu2024neurips-mind/) doi:10.52202/079017-2866

BibTeX

@inproceedings{wu2024neurips-mind,
  title     = {{Mind's Eye of LLMs: Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models}},
  author    = {Wu, Wenshan and Mao, Shaoguang and Zhang, Yadong and Xia, Yan and Dong, Li and Cui, Lei and Wei, Furu},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2866},
  url       = {https://mlanthology.org/neurips/2024/wu2024neurips-mind/}
}