Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding

Abstract

Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to a higher number of accepted tokens compared to token-wise verification. However, existing solutions often rely on surrogate approximations or are constrained by partial information, struggling with joint intractability. In this work, we propose \emph{Hierarchical Speculative Decoding (HSD)}, a provably lossless verification method that significantly boosts the expected number of accepted tokens and overcomes joint intractability by balancing excess and deficient probability mass across accessible branches. Our extensive large-scale experiments demonstrate that HSD yields consistent improvements in acceptance rates across diverse model families and benchmarks. Moreover, its strong explainability and generality make it readily integrable into a wide range of speculative decoding frameworks. Notably, integrating HSD into EAGLE-3 yields over a 12\% performance gain, establishing state-of-the-art decoding efficiency without compromising distribution fidelity. Code is available at https://github.com/ZhouYuxuanYX/Hierarchical-Speculative-Decoding.

Cite

Text

Zhou et al. "Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding." International Conference on Learning Representations, 2026.

Markdown

[Zhou et al. "Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhou2026iclr-overcoming/)

BibTeX

@inproceedings{zhou2026iclr-overcoming,
  title     = {{Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding}},
  author    = {Zhou, Yuxuan and Huang, Fei and Li, Heng and Wu, Fengyi and Wang, Tianyu and Zhang, Jianwei and Lin, Junyang and Cheng, Zhi-Qi},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/zhou2026iclr-overcoming/}
}