DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate

Yuan, Zhihang; Xie, Rui; Shang, Yuzhang; Zhang, Hanling; Wang, Siyuan; Yan, Shengen; Dai, Guohao; Wang, Yu

DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate

Zhihang Yuan, Rui Xie, Yuzhang Shang, Hanling Zhang, Siyuan Wang, Shengen Yan, Guohao Dai, Yu Wang

ICCV 2025 pp. 16410-16419

/iccv/2025/yuan2025iccv-dlfrgen/

Abstract

Diffusion Transformer (DiT)-based generation models have achieved remarkable success in video generation. However, their inherent computational demands pose significant efficiency challenges. In this paper, we exploit the inherent temporal non-uniformity of real-world videos, and observe that videos exhibit dynamic information density, with high-motion segments demanding greater detail preservation than static scenes. Inspired by this temporal non-uniformity, we propose DLFR-Gen, a training-free approach for Diffusion-based Video Generation with Dynamic Latent Frame Rate. DLFR-Gen adaptively adjusts the number of elements in latent space based on the motion frequency of the latent space content, using fewer tokens for low-frequency segments while preserving detail in high-frequency segments. Specifically, our key contributions are: (1) A dynamic frame rate scheduler for DiT video generation that adaptively assigns frame rates for video segments. (2) A novel latent-space frame merging method to align latent representations with their denoised counterparts before merging those redundant in low-resolution space. (3) A preference analysis of Rotary Positional Embeddings (RoPE) across DiT layers, informing a tailored RoPE strategy optimized for semantic and local information capture. Experiments show that DLFR-Gen can achieve a speedup up to 3x for video generation with minimal quality degradation.

PDF ICCV Semantic Scholar

Cite

Text

Yuan et al. "DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate." International Conference on Computer Vision, 2025.

Markdown

[Yuan et al. "DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate." International Conference on Computer Vision, 2025.](https://mlanthology.org/iccv/2025/yuan2025iccv-dlfrgen/)

BibTeX

@inproceedings{yuan2025iccv-dlfrgen,
  title     = {{DLFR-Gen: Diffusion-Based Video Generation with Dynamic Latent Frame Rate}},
  author    = {Yuan, Zhihang and Xie, Rui and Shang, Yuzhang and Zhang, Hanling and Wang, Siyuan and Yan, Shengen and Dai, Guohao and Wang, Yu},
  booktitle = {International Conference on Computer Vision},
  year      = {2025},
  pages     = {16410-16419},
  url       = {https://mlanthology.org/iccv/2025/yuan2025iccv-dlfrgen/}
}