Self-Speculative Masked Diffusions

Abstract

We present self-speculative masked diffusions, a new class of masked diffusion generative models for discrete data that require significantly fewer function evaluations to generate samples. Standard masked diffusion models predict factorized logits over currently masked positions. A number of masked positions are then sampled, however, the factorization approximation means that sampling too many positions in one go leads to poor sample quality. As a result, many simulation steps and therefore neural network function evaluations are required to generate high-quality data. We reduce the computational burden by generating \emph{non-factorized} predictions over masked positions. This is achieved by modifying the final transformer attention mask from non-causal to causal, enabling draft token generation and parallel validation via a novel, model-integrated speculative sampling mechanism. This results in a non-factorized predictive distribution over masked positions in a single forward pass. We apply our method to GPT2 scale text modelling and protein sequence generation, finding that we can achieve a ~2x reduction in the required number of network forward passes relative to standard masked diffusion models.

Cite

Text

Campbell et al. "Self-Speculative Masked Diffusions." International Conference on Learning Representations, 2026.

Markdown

[Campbell et al. "Self-Speculative Masked Diffusions." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/campbell2026iclr-selfspeculative/)

BibTeX

@inproceedings{campbell2026iclr-selfspeculative,
  title     = {{Self-Speculative Masked Diffusions}},
  author    = {Campbell, Andrew and De Bortoli, Valentin and Shi, Jiaxin and Doucet, Arnaud},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/campbell2026iclr-selfspeculative/}
}