Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding
Abstract
Speculative decoding accelerates large language model (LLM) inference by letting a lightweight draft model propose multiple tokens that the target model verifies in parallel. Yet existing training objectives optimize only a single greedy draft path, while decoding follows a tree policy that re-ranks and verifies multiple branches. This draft policy misalignment limits achievable speedups. We introduce **Group Tree Optimization** (GTO), which aligns training with the decoding-time tree policy through two components: (i) Draft Tree Reward, a sampling-free objective equal to the expected acceptance length of the draft tree under the target model, directly measuring decoding performance; (ii) Group-based Draft Policy Training, a stable optimization scheme that contrasts trees from the current and a frozen reference draft model, forming debiased group-standardized advantages and applying a PPO-style surrogate along the longest accepted sequence for robust updates. We further prove that increasing our Draft Tree Reward provably improves acceptance length and speedup. Across dialogue (MT-Bench), code (HumanEval), and math (GSM8K), and multiple LLMs (e.g., LLaMA-3.1-8B, LLaMA-3.3-70B, Vicuna-1.3-13B, DeepSeek-R1-Distill-LLaMA-8B, Qwen3-8B), GTO increases acceptance length by \(7.4\%\) and yields an additional \(7.7\%\) speedup over prior state-of-the-art EAGLE-3. By bridging draft policy misalignment, GTO offers a practical, general solution for efficient LLM inference. Code and draft models are available at https://github.com/hsj576/GTO.
Cite
Text
Hu et al. "Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding." International Conference on Learning Representations, 2026.Markdown
[Hu et al. "Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/hu2026iclr-bridging/)BibTeX
@inproceedings{hu2026iclr-bridging,
title = {{Bridging Draft Policy Misalignment: Group Tree Optimization for Speculative Decoding}},
author = {Hu, Shijing and Li, Jingyang and Lu, Zhihui and Zhou, Pan},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/hu2026iclr-bridging/}
}