Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding

Yue Guan, Changming Yu, Shihan Fang, Weiming Hu, Zaifeng Pan, Zheng Wang, Zihan Liu, Yangjie Zhou, Yufei Ding, Minyi Guo, Jingwen Leng

NeurIPS 2025

/neurips/2025/guan2025neurips-yggdrasil/

Abstract

Speculative decoding improves LLM inference by generating and verifying multiple tokens in parallel, but existing systems suffer from suboptimal performance due to a mismatch between dynamic speculation and static runtime assumptions. We present Yggdrasil, a co-designed system that enables latency-optimal speculative decoding through context-aware tree drafting and compiler-friendly execution. Yggdrasil introduces an equal-growth tree structure for static graph compatibility, a latency-aware optimization objective for draft selection, and stage-based scheduling to reduce overhead. Yggdrasil supports unmodified LLMs and achieves up to $3.98\times$ speedup over state-of-the-art baselines across multiple hardware setups.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Guan et al. "Yggdrasil: Bridging Dynamic Speculation and Static Runtime  for Latency-Optimal Tree-Based LLM Decoding." Advances in Neural Information Processing Systems, 2025.

Markdown

[Guan et al. "Yggdrasil: Bridging Dynamic Speculation and Static Runtime  for Latency-Optimal Tree-Based LLM Decoding." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/guan2025neurips-yggdrasil/)

BibTeX

@inproceedings{guan2025neurips-yggdrasil,
  title     = {{Yggdrasil: Bridging Dynamic Speculation and Static Runtime  for Latency-Optimal Tree-Based LLM Decoding}},
  author    = {Guan, Yue and Yu, Changming and Fang, Shihan and Hu, Weiming and Pan, Zaifeng and Wang, Zheng and Liu, Zihan and Zhou, Yangjie and Ding, Yufei and Guo, Minyi and Leng, Jingwen},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/guan2025neurips-yggdrasil/}
}