Towards Visualization-of-Thought Jailbreak Attack Against Large Visual Language Models

Hongqiong Zhong, Qingyang Teng, Baolin Zheng, Guanlin Chen, Yingshui Tan, Zhendong Liu, Jiaheng Liu, Wenbo Su, Xiaoyong Zhu, Bo Zheng, Kaifu Zhang

NeurIPS 2025

/neurips/2025/zhong2025neurips-visualizationofthought/

Abstract

As Visual Language Models (VLMs) continue to evolve, they have demonstrated increasingly sophisticated logical reasoning capabilities and multimodal thought generation, opening doors to widespread applications. However, this advancement raises serious concerns about content security, particularly when these models process complex multimodal inputs requiring intricate reasoning. When faced with these safety challenges, the critical competition between logical reasoning and safety objectives of VLMs is often overlooked in previous works. In this paper, we introduce Visualization-of-Thought Attack (\textbf{VoTA}), a novel and automated attack framework that strategically constructs chains of images with risky visual thoughts to challenge victim models. Our attack provokes the inherent conflict between the model's logical processing and safety protocols, ultimately leading to the generation of unsafe content. Through comprehensive experiments, VoTA achieves remarkable effectiveness, improving the average attack success rate (ASR) by 26.71\% (from 63.70\% to 90.41\%) on 9 open-source and 6 commercial VLMs, compared to the state-of-the-art methods. These results expose a critical vulnerability: current VLMs struggle to maintain safety guarantees when processing insecure multimodal visualization-of-thought inputs, highlighting the urgency and necessity of enhancing safety alignment. Our code and dataset are available at https://github.com/Hongqiong12/VoTA. Content Warning: This paper contains harmful contents that may be offensive.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhong et al. "Towards Visualization-of-Thought Jailbreak Attack Against Large Visual Language Models." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhong et al. "Towards Visualization-of-Thought Jailbreak Attack Against Large Visual Language Models." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhong2025neurips-visualizationofthought/)

BibTeX

@inproceedings{zhong2025neurips-visualizationofthought,
  title     = {{Towards Visualization-of-Thought Jailbreak Attack Against Large Visual Language Models}},
  author    = {Zhong, Hongqiong and Teng, Qingyang and Zheng, Baolin and Chen, Guanlin and Tan, Yingshui and Liu, Zhendong and Liu, Jiaheng and Su, Wenbo and Zhu, Xiaoyong and Zheng, Bo and Zhang, Kaifu},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhong2025neurips-visualizationofthought/}
}