SpecTr++: Improved Transport Plans for Speculative Decoding of Large Language Models
Abstract
We revisit the question of accelerating decoding of language models based on speculative draft samples, inspired by Y. Leviathan et al. (ICML 2023). Following Z. Sun et al. (NeurIPS 2023) which makes connections between speculative decoding and optimal transport theory, we design improved transport plans for this problem with no sacrifice in computational complexity in terms of the alphabet size.
Cite
Text
Ahn et al. "SpecTr++: Improved Transport Plans for Speculative Decoding of Large Language Models." NeurIPS 2023 Workshops: OTML, 2023.Markdown
[Ahn et al. "SpecTr++: Improved Transport Plans for Speculative Decoding of Large Language Models." NeurIPS 2023 Workshops: OTML, 2023.](https://mlanthology.org/neuripsw/2023/ahn2023neuripsw-spectr/)BibTeX
@inproceedings{ahn2023neuripsw-spectr,
title = {{SpecTr++: Improved Transport Plans for Speculative Decoding of Large Language Models}},
author = {Ahn, Kwangjun and Beirami, Ahmad and Sun, Ziteng and Suresh, Ananda Theertha},
booktitle = {NeurIPS 2023 Workshops: OTML},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/ahn2023neuripsw-spectr/}
}