Parallel Token Prediction for Language Models

Abstract

Autoregressive decoding in language models is inherently slow, generating only one token per forward pass. We propose Parallel Token Prediction (PTP), a general-purpose framework for predicting multiple tokens in a single model call. PTP moves the source of randomness from post-hoc sampling to random input variables, making future tokens deterministic functions of those inputs and thus jointly predictable in a single forward pass. We prove that a single PTP call can represent arbitrary dependencies between tokens. PTP is trained by distilling an existing model or through inverse autoregressive training without a teacher. Experimentally, PTP achieves a 2.4$\times$ speedup on a diverse-task speculative decoding benchmark. We provide code and checkpoints at https://github.com/mandt-lab/ptp.

Cite

Text

Draxler et al. "Parallel Token Prediction for  Language Models." International Conference on Learning Representations, 2026.

Markdown

[Draxler et al. "Parallel Token Prediction for  Language Models." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/draxler2026iclr-parallel/)

BibTeX

@inproceedings{draxler2026iclr-parallel,
  title     = {{Parallel Token Prediction for  Language Models}},
  author    = {Draxler, Felix and Will, Justus and Sofian, Farrin Marouf and Karaletsos, Theofanis and Singh, Sameer and Mandt, Stephan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/draxler2026iclr-parallel/}
}