Bridging Draft Policy Misalignment: Group Tree Optimization  for Speculative Decoding

Hu, Shijing; Li, Jingyang; Lu, Zhihui; Zhou, Pan

Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding

Shijing Hu, Jingyang Li, Zhihui Lu, Pan Zhou

ICLR 2026

/iclr/2026/hu2026iclr-bridging/

Abstract

Speculative decoding accelerates large language model (LLM) inference by letting a lightweight draft model propose multiple tokens that the target model verifies in parallel. Yet existing training objectives optimize only a single greedy draft path, while decoding follows a tree policy that re-ranks and verifies multiple branches. This draft policy misalignment limits achievable speedups. We introduce **Group Tree Optimization** (GTO), which aligns training with the decoding-time tree policy through two components: (i) Draft Tree Reward, a sampling-free objective equal to the expected acceptance length of the draft tree under the target model, directly measuring decoding performance; (ii) Group-based Draft Policy Training, a stable optimization scheme that contrasts trees from the current and a frozen reference draft model, forming debiased group-standardized advantages and applying a PPO-style surrogate along the longest accepted sequence for robust updates. We further prove that increasing our Draft Tree Reward provably improves acceptance length and speedup. Across dialogue (MT-Bench), code (HumanEval), and math (GSM8K), and multiple LLMs (e.g., LLaMA-3.1-8B, LLaMA-3.3-70B, Vicuna-1.3-13B, DeepSeek-R1-Distill-LLaMA-8B, Qwen3-8B), GTO increases acceptance length by \(7.4\%\) and yields an additional \(7.7\%\) speedup over prior state-of-the-art EAGLE-3. By bridging draft policy misalignment, GTO offers a practical, general solution for efficient LLM inference. Code and draft models are available at https://github.com/hsj576/GTO.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Hu et al. "Bridging Draft Policy Misalignment: Group Tree Optimization  for Speculative Decoding." International Conference on Learning Representations, 2026.

Markdown

[Hu et al. "Bridging Draft Policy Misalignment: Group Tree Optimization  for Speculative Decoding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-bridging/)

BibTeX

@inproceedings{hu2026iclr-bridging,
  title     = {{Bridging Draft Policy Misalignment: Group Tree Optimization  for Speculative Decoding}},
  author    = {Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/hu2026iclr-bridging/}
}