Any-Order Flexible Length Masked Diffusion
Abstract
Masked diffusion models (MDMs) have recently emerged as a promising alternative to autoregressive models over discrete domains. MDMs generate sequences in an any-order, parallel fashion, enabling fast inference and strong performance on non-causal tasks. However, a crucial limitation is that they do not support token insertions and are thus limited to *fixed-length* generations. To this end, we introduce **Flex**ible **M**asked **D**iffusion **M**odels (FlexMDMs), a discrete diffusion paradigm that simultaneously can model sequences of flexible length while provably retaining MDMs' flexibility of any-order inference. Grounded in an extension of the stochastic interpolant framework, FlexMDMs generate sequences by inserting mask tokens and unmasking them. Empirically, we show that FlexMDMs match MDMs in perplexity while modeling length statistics with much higher fidelity. On a synthetic maze planning task, they achieve $\approx$ 60\% higher success rate than MDM baselines. Finally, we show pretrained MDMs can easily be *retrofitted* into FlexMDMs: on 16 H100s, it takes only three days to fine-tune LLaDA-8B into a FlexMDM, achieving superior performance on math (GSM8K, 58\%$\to$67\%) and code infilling performance (52\%$\to$65\%).
Cite
Text
Kim et al. "Any-Order Flexible Length Masked Diffusion." International Conference on Learning Representations, 2026.Markdown
[Kim et al. "Any-Order Flexible Length Masked Diffusion." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/kim2026iclr-anyorder/)BibTeX
@inproceedings{kim2026iclr-anyorder,
title = {{Any-Order Flexible Length Masked Diffusion}},
author = {Kim, Jaeyeon and Kit, Lee Cheuk and Domingo-Enrich, Carles and Du, Yilun and Kakade, Sham M. and Ngotiaoco, Timothy and Chen, Sitan and Albergo, Michael Samuel},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/kim2026iclr-anyorder/}
}