ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Li, Jia-Nan; Guan, Jian; Wu, Wei; Li, Chongxuan

ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding

Jia-Nan Li, Jian Guan, Wei Wu, Chongxuan Li

ICLR 2026

/iclr/2026/li2026iclr-refusion/

Abstract

Autoregressive models (ARMs) are hindered by slow sequential inference. While masked diffusion models (MDMs) offer a parallel alternative, they suffer from critical drawbacks: high computational overhead from precluding Key-Value (KV) caching, and incoherent generation arising from learning dependencies over an intractable space of token combinations. To address these limitations, we introduce ReFusion, a novel masked diffusion model that integrates sequence reorganization into the causal attention framework. By elevating parallel decoding from the token level to a higher slot level, ReFusion interleaves inter-slot diffusion-based selection with intra-slot autoregressive infilling, while reordering newly generated slots ahead of the remaining masks after each iteration. Consequently, this design simultaneously unlocks full KV cache reuse and reduces learning complexity from an intractable token combination space to a manageable slot-level permutation space. Extensive experiments on seven diverse benchmarks show that ReFusion not only overwhelmingly surpasses prior MDMs with a 34\% performance gain and an over 18$\times$ speedup on average, but also bridges the performance gap to strong ARMs while maintaining a 2.33$\times$ average speedup.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-refusion/)

BibTeX

@inproceedings{li2026iclr-refusion,
  title     = {{ReFusion: A Diffusion Large Language Model with Parallel Autoregressive Decoding}},
  author    = {Li, Jia-Nan and Guan, Jian and Wu, Wei and Li, Chongxuan},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-refusion/}
}