Pyramid Patchification Flow for Visual Generation

Li, Hui; Chen, Baoyou; Jiaye, Li; Wang, Jingdong; Zhu, Siyu

Pyramid Patchification Flow for Visual Generation

Hui Li, Baoyou Chen, Li Jiaye, Jingdong Wang, Siyu Zhu

ICLR 2026

/iclr/2026/li2026iclr-pyramid/

Abstract

Diffusion Transformers (DiTs) typically use the same patch size for $\operatorname{Patchify}$ across timesteps, enforcing a constant token budget across timesteps. In this paper, we introduce Pyramidal Patchification Flow (PPFlow), which reduces the number of tokens for high-noise timesteps to improve the sampling efficiency. The idea is simple: use larger patches at higher-noise timesteps and smaller patches at lower-noise timesteps. The implementation is easy: share the DiT's transformer blocks across timesteps, and learn separate linear projections for different patch sizes in $\operatorname{Patchify}$ and $\operatorname{Unpatchify}$. Unlike Pyramidal Flow that operates on pyramid representations,, our approach operates over full latent representations, eliminating trajectory ``jump points'', and thus avoiding re-noising tricks for sampling. Training from pretrained SiT-XL/2 requires only $+8.9\%$ additional training FLOPs and delivers $2.02\times$ denoising speedups with image generation quality kept; training from scratch achieves comparable sampling speedup, e.g., $2.04\times$ speedup in SiT-B. Training from text-to-image model FLUX.1, PPFlow can achieve $1.61 - 1.86 \times$ speedup from 512 to 2048 resolution with comparable quality.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Li et al. "Pyramid Patchification Flow for Visual Generation." International Conference on Learning Representations, 2026.

Markdown

[Li et al. "Pyramid Patchification Flow for Visual Generation." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/li2026iclr-pyramid/)

BibTeX

@inproceedings{li2026iclr-pyramid,
  title     = {{Pyramid Patchification Flow for Visual Generation}},
  author    = {Li, Hui and Chen, Baoyou and Jiaye, Li and Wang, Jingdong and Zhu, Siyu},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/li2026iclr-pyramid/}
}