MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale

Abstract

We introduce MarDini, a new family of video diffusion models that integrate the advantages of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR handles temporal planning, while DM focuses on spatial generation in an asymmetric network design: i) a MAR-based planning model containing most of the parameters generates planning signals for each masked frame using low-resolution input; ii) a lightweight generation model uses these signals to produce high-resolution frames via diffusion de-noising. MarDini’s MAR enables video generation conditioned on any number of masked frames at any frame positions: a single model can handle video interpolation (e.g., masking middle frames), image-to-video generation (e.g., masking from the second frame onward), and video expansion (e.g., masking half the frames). The efficient design allocates most of the computational resources to the low-resolution planning model, making computationally expensive but important spatio-temporal attention feasible at scale. MarDini sets a new state-of-the-art for video interpolation; meanwhile, within few inference steps, it efficiently generates videos on par with those of much more expensive advanced image-to-video models.

Cite

Text

Liu et al. "MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale." Transactions on Machine Learning Research, 2025.

Markdown

[Liu et al. "MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/liu2025tmlr-mardini/)

BibTeX

@article{liu2025tmlr-mardini,
  title     = {{MarDini: Masked Auto-Regressive Diffusion for Video Generation at Scale}},
  author    = {Liu, Haozhe and Liu, Shikun and Zhou, Zijian and Xu, Mengmeng and Xie, Yanping and Han, Xiao and Perez, Juan Camilo and Liu, Ding and Kahatapitiya, Kumara and Jia, Menglin and Wu, Jui-Chieh and He, Sen and Xiang, Tao and Schmidhuber, Jürgen and Perez-Rua, Juan-Manuel},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/liu2025tmlr-mardini/}
}