Enhancing LLM Reasoning via Vision-Augmented Prompting

Abstract

Verbal and visual-spatial information processing are two critical subsystems that activate different brain regions and often collaborate together for cognitive reasoning. Despite the rapid advancement of LLM-based reasoning, the mainstream frameworks, such as Chain-of-Thought (CoT) and its variants, primarily focus on the verbal dimension, resulting in limitations in tackling reasoning problems with visual and spatial clues. To bridge the gap, we propose a novel dual-modality reasoning framework called Vision-Augmented Prompting (VAP). Upon receiving a textual problem description, VAP automatically synthesizes an image from the visual and spatial clues by utilizing external drawing tools. Subsequently, VAP formulates a chain of thought in both modalities and iteratively refines the synthesized image. Finally, a conclusive reasoning scheme based on self-alignment is proposed for final result generation. Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validated the superiority of VAP over existing LLMs-based reasoning frameworks.

Cite

Text

Xiao et al. "Enhancing LLM Reasoning via Vision-Augmented Prompting." Neural Information Processing Systems, 2024. doi:10.52202/079017-0905

Markdown

[Xiao et al. "Enhancing LLM Reasoning via Vision-Augmented Prompting." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/xiao2024neurips-enhancing-a/) doi:10.52202/079017-0905

BibTeX

@inproceedings{xiao2024neurips-enhancing-a,
  title     = {{Enhancing LLM Reasoning via Vision-Augmented Prompting}},
  author    = {Xiao, Ziyang and Zhang, Dongxiang and Han, Xiongwei and Fu, Xiaojin and Yu, Wing Yin and Zhong, Tao and Wu, Sai and Wang, Yuan and Yin, Jianwei and Chen, Gang},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-0905},
  url       = {https://mlanthology.org/neurips/2024/xiao2024neurips-enhancing-a/}
}