SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers

Park, Dogyun; Haji-Ali, Moayed; Li, Yanyu; Menapace, Willi; Tulyakov, Sergey; Kim, Hyunwoo J.; Siarohin, Aliaksandr; Kag, Anil

SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers

Dogyun Park, Moayed Haji-Ali, Yanyu Li, Willi Menapace, Sergey Tulyakov, Hyunwoo J. Kim, Aliaksandr Siarohin, Anil Kag

ICLR 2026

/iclr/2026/park2026iclr-sprint/

Abstract

Diffusion Transformers (DiTs) deliver state-of-the-art generative performance but their quadratic training cost with sequence length makes large-scale pretraining prohibitively expensive. Token dropping can reduce training cost, yet naïve strategies degrade representations, and existing methods are either parameter-heavy or fail at high drop ratios. We present SPRINT (Sparse--Dense Residual Fusion for Efficient Diffusion Transformers), a simple method that enables aggressive token dropping (up to 75%) while preserving quality. SPRINT leverages the complementary roles of shallow and deep layers: early layers process all tokens to capture local detail, deeper layers operate on a sparse subset to cut computation, and their outputs are fused through residual connections. Training follows a two-stage schedule: long masked pre-training for efficiency followed by short full-token fine-tuning to close the train--inference gap. On ImageNet-1K 256^2, SPRINT achieves 9.8x training savings with comparable FID/FDD, and at inference, its Path-Drop Guidance (PDG) nearly halves FLOPs while improving quality. These results establish SPRINT as a simple, effective, and general solution for efficient DiT training.

PDF ICLR OpenReview Semantic Scholar

Cite

Text

Park et al. "SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers." International Conference on Learning Representations, 2026.

Markdown

[Park et al. "SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/park2026iclr-sprint/)

BibTeX

@inproceedings{park2026iclr-sprint,
  title     = {{SPRINT: Sparse-Dense Residual Fusion for Efficient Diffusion Transformers}},
  author    = {Park, Dogyun and Haji-Ali, Moayed and Li, Yanyu and Menapace, Willi and Tulyakov, Sergey and Kim, Hyunwoo J. and Siarohin, Aliaksandr and Kag, Anil},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/park2026iclr-sprint/}
}