Latent Chain-of-Thought for Visual Reasoning

Abstract

Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PPO, and GRPO may not generalize well across unseen reasoning tasks and heavily rely on a biased reward model. To address this challenge, we reformulate reasoning in LVLMs as posterior inference and propose a scalable training algorithm based on amortized variational inference. By leveraging diversity-seeking reinforcement learning algorithms, we introduce a novel sparse reward function for token-level learning signals that encourage diverse, high-likelihood latent CoT, overcoming deterministic sampling limitations and avoiding reward hacking. Additionally, we implement a Bayesian inference-scaling strategy that replaces costly Best-of-N and Beam Search with a marginal likelihood to efficiently rank optimal rationales and answers. We empirically demonstrate that the proposed method enhances the state-of-the-art LVLMs on four reasoning benchmarks, in terms of effectiveness, generalization, and interpretability.

Cite

Text

Sun et al. "Latent Chain-of-Thought for Visual Reasoning." Advances in Neural Information Processing Systems, 2025.

Markdown

[Sun et al. "Latent Chain-of-Thought for Visual Reasoning." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/sun2025neurips-latent-a/)

BibTeX

@inproceedings{sun2025neurips-latent-a,
  title     = {{Latent Chain-of-Thought for Visual Reasoning}},
  author    = {Sun, Guohao and Hua, Hang and Wang, Jian and Luo, Jiebo and Dianat, Sohail and Rabbani, Majid and Rao, Raghuveer and Tao, Zhiqiang},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/sun2025neurips-latent-a/}
}