DPad: Efficient Diffusion Language Models with Suffix Dropout
Abstract
Diffusion-based Large Language Models (dLLMs) parallelize text generation by framing decoding as a denoising process, but suffer from high computational overhead since they predict all future suffix tokens at each step while retaining only a small fraction. We propose $\textbf{Diffusion Scratchpad} (\textbf{\textit{DPad}})$, a training-free method that restricts attention to a structured subset of suffix tokens, preserving fidelity while eliminating redundancy. $\textit{DPad}$ integrates two strategies: (i) a $\textit{sliding window}$, which maintains a fixed-length suffix window, and (ii) $\textit{distance-decay dropout}$, which deterministically removes distant suffix tokens before attention computation. This concise design is compatible with existing optimizations such as parallel decoding and prefix caching, and lends itself to a lightweight implementation. Comprehensive evaluations across multiple benchmarks on $\texttt{LLaDA}$ and $\texttt{Dream}$ models demonstrate that $\textit{DPad}$ delivers up to $\mathbf{61.4\times}$ speedup over vanilla dLLMs while maintaining comparable accuracy, highlighting its potential for efficient and scalable long-sequence inference.
Cite
Text
Chen et al. "DPad: Efficient Diffusion Language Models with Suffix Dropout." International Conference on Learning Representations, 2026.Markdown
[Chen et al. "DPad: Efficient Diffusion Language Models with Suffix Dropout." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/chen2026iclr-dpad/)BibTeX
@inproceedings{chen2026iclr-dpad,
title = {{DPad: Efficient Diffusion Language Models with Suffix Dropout}},
author = {Chen, Xinhua and Huang, Sitao and Guo, Cong and Wei, Chiyue and He, Yintao and Zhang, Jianyi and Li, Hai Helen and Chen, Yiran},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/chen2026iclr-dpad/}
}