Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling
Abstract
Standard discrete diffusion models treat all unobserved states the same way, typically mapping them to an absorbing [MASK] token. This creates an "information void" where global semantic information that may be inferred for the masked tokens from the unmasked tokens is not directly passed from one denoising step to another. We introduce **Continuously Augmented Discrete Diffusion (CADD)**, a framework that augments the discrete state space with a paired diffusion in a continuous latent space. This yields graded, gradually corrupted states in which masked tokens are represented by noisy yet informative latent vectors rather than information voids. At each reverse step, CADD uses the continuous latent as a semantic hint to guide discrete denoising. The design is clean and compatible with existing discrete diffusion training. At sampling time, the strength and estimator of the continuous latent vector enables a controlled trade-off between mode-coverage (diversity-oriented) and mode-seeking (context-localization-oriented). Empirically, we demonstrate CADD improves generative quality over mask-based diffusion across text generation, image synthesis, and code modeling, with consistent gains on both qualitative and quantitative metrics against strong discrete baselines. Our codebase is available at: https://github.com/apple/ml-CADD
Cite
Text
Zheng et al. "Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling." International Conference on Learning Representations, 2026.Markdown
[Zheng et al. "Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zheng2026iclr-continuously/)BibTeX
@inproceedings{zheng2026iclr-continuously,
title = {{Continuously Augmented Discrete Diffusion Model for Categorical Generative Modeling}},
author = {Zheng, Huangjie and Gong, Shansan and Zhang, Ruixiang and Chen, Tianrong and Gu, Jiatao and Zhou, Mingyuan and Jaitly, Navdeep and Zhang, Yizhe},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zheng2026iclr-continuously/}
}