Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation

Abstract

Large Vision-Language Models (VLMs) have achieved remarkable progress in multimodal understanding, yet they struggle when reasoning over information-intensive images that densely interleave textual annotations with fine-grained graphical elements. The main challenges lie in precisely localizing critical cues in dense layouts and multi-hop reasoning to integrate dispersed evidence. We propose Speculative Verdict (SV), a training-free framework inspired by speculative decoding that combines multiple lightweight draft experts with a large verdict model. In the draft stage, small VLMs act as draft experts to generate reasoning paths that provide diverse localization candidates; in the verdict stage, a strong VLM synthesizes these paths to produce the final answer, minimizing computational cost while recovering correct answers. To further improve both efficiency and accuracy, SV introduces a consensus expert selection mechanism that forwards only high-agreement reasoning paths to the verdict. Empirically, SV achieves consistent gains on challenging information-intensive and high-resolution visual question answering benchmarks, including InfographicVQA, ChartMuseum, ChartQAPro, and HR-Bench 4K. By synthesizing correct insights from partially accurate reasoning paths, SV achieves both error correction and cost-efficiency compared to large proprietary models or training pipelines.

Cite

Text

Liu et al. "Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation." International Conference on Learning Representations, 2026.

Markdown

[Liu et al. "Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/liu2026iclr-small/)

BibTeX

@inproceedings{liu2026iclr-small,
  title     = {{Small Drafts, Big Verdict: Information-Intensive Visual Reasoning via Speculation}},
  author    = {Liu, Yuhan and Qin, Lianhui and Wan, Shenji},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/liu2026iclr-small/}
}