Di[M]O: Distilling Masked Diffusion Models into One-Step Generator
Abstract
Masked Diffusion Models (MDMs) have emerged as a powerful generative modeling technique. Despite their remarkable results, they typically suffer from slow inference with several steps. In this paper, we propose Di\mathtt [M] O, a novel approach that distills masked diffusion models into a one-step generator.Di\mathtt [M] O addresses two key challenges: (1) the intractability of using intermediate-step information for one-step generation, which we solve through token-level distribution matching that optimizes model output logits by an `on-policy framework' with the help of an auxiliary model; and (2) the lack of entropy in the initial distribution, which we address through a token initialization strategy that injects randomness while maintaining similarity to teacher training distribution. We show Di\mathtt [M] O's effectiveness on both class-conditional and text-conditional image generation, impressively achieving performance competitive to multi-step teacher outputs while drastically reducing inference time. To our knowledge, we are the first to successfully achieve one-step distillation of masked diffusion models and the first to apply discrete distillation to text-to-image generation, opening new paths for efficient generative modeling.
Cite
Text
Zhu et al. "Di[M]O: Distilling Masked Diffusion Models into One-Step Generator." International Conference on Computer Vision, 2025.Markdown
[Zhu et al. "Di[M]O: Distilling Masked Diffusion Models into One-Step Generator." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/zhu2025iccv-di/)BibTeX
@inproceedings{zhu2025iccv-di,
title = {{Di[M]O: Distilling Masked Diffusion Models into One-Step Generator}},
author = {Zhu, Yuanzhi and Wang, Xi and Lathuilière, Stéphane and Kalogeiton, Vicky},
booktitle = {International Conference on Computer Vision},
year = {2025},
pages = {18606-18618},
url = {https://mlanthology.org/iccv/2025/zhu2025iccv-di/}
}