In-Batch Ensemble Drafting: Robust Speculative Decoding for LVLMs

Abstract

Despite the success of Speculative Decoding (SD) in LLM inference acceleration, it largely remains unexplored for Large Vision Language Models (LVLMs), an advanced class of LLMs that can handle multimodal prompts consisting of text and image tokens. To bridge this gap, we first conduct a comprehensive benchmarking study, focusing on the effectiveness of various drafting methods. We observe that various drafting methods have their own advantages, and none of them consistently outperforms the others. Motivated by this observation, we propose **In-batch Ensemble Drafting (IbED)**, a simple yet effective SD method for LVLMs. IbED leverages multiple drafting methods without incurring much additional latency via batch inference and, compared to multimodal drafting, consistently demonstrates significant improvements in block efficiency, averaging 6% (with a maximum of 23%) across a wide range of datasets.

Cite

Text

Lee et al. "In-Batch Ensemble Drafting: Robust Speculative Decoding for LVLMs." ICLR 2025 Workshops: SCOPE, 2025.

Markdown

[Lee et al. "In-Batch Ensemble Drafting: Robust Speculative Decoding for LVLMs." ICLR 2025 Workshops: SCOPE, 2025.](https://mlanthology.org/iclrw/2025/lee2025iclrw-inbatch/)

BibTeX

@inproceedings{lee2025iclrw-inbatch,
  title     = {{In-Batch Ensemble Drafting: Robust Speculative Decoding for LVLMs}},
  author    = {Lee, Minjae and Kang, Wonjun and Ahn, Byeongkeun and Classen, Christian and Yan, Minghao and Koo, Hyung Il and Lee, Kangwook},
  booktitle = {ICLR 2025 Workshops: SCOPE},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/lee2025iclrw-inbatch/}
}