Enhancing LLM Reasoning via Vision-Augmented Prompting
Abstract
Verbal and visual-spatial information processing are two critical subsystems that activate different brain regions and often collaborate together for cognitive reasoning. Despite the rapid advancement of LLM-based reasoning, the mainstream frameworks, such as Chain-of-Thought (CoT) and its variants, primarily focus on the verbal dimension, resulting in limitations in tackling reasoning problems with visual and spatial clues. To bridge the gap, we propose a novel dual-modality reasoning framework called Vision-Augmented Prompting (VAP). Upon receiving a textual problem description, VAP automatically synthesizes an image from the visual and spatial clues by utilizing external drawing tools. Subsequently, VAP formulates a chain of thought in both modalities and iteratively refines the synthesized image. Finally, a conclusive reasoning scheme based on self-alignment is proposed for final result generation. Extensive experiments are conducted across four versatile tasks, including solving geometry problems, Sudoku, time series prediction, and travelling salesman problem. The results validated the superiority of VAP over existing LLMs-based reasoning frameworks.
Cite
Text
Xiao et al. "Enhancing LLM Reasoning via Vision-Augmented Prompting." Neural Information Processing Systems, 2024. doi:10.52202/079017-0905Markdown
[Xiao et al. "Enhancing LLM Reasoning via Vision-Augmented Prompting." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/xiao2024neurips-enhancing-a/) doi:10.52202/079017-0905BibTeX
@inproceedings{xiao2024neurips-enhancing-a,
title = {{Enhancing LLM Reasoning via Vision-Augmented Prompting}},
author = {Xiao, Ziyang and Zhang, Dongxiang and Han, Xiongwei and Fu, Xiaojin and Yu, Wing Yin and Zhong, Tao and Wu, Sai and Wang, Yuan and Yin, Jianwei and Chen, Gang},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-0905},
url = {https://mlanthology.org/neurips/2024/xiao2024neurips-enhancing-a/}
}