REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Abstract
Diffusion Transformers (DiTs) deliver state-of-the-art image quality, yet their training remains notoriously slow. A recent remedy---representation alignment (REPA) that matches DiT hidden features to those of a non-generative teacher (e.g., DINO)---dramatically accelerates the early epochs but plateaus or even degrades performance later. We trace this failure to the capacity mismatch: once the generative student begins modeling the joint data distribution, the teacher's lower-dimensional embeddings and attention patterns become a straitjacket rather than a guide. We then introduce HASTE (Holistic Alignment with Stage-wise Termination for Efficient training), a two-phase schedule that keeps the help and drops the hindrance. Phase I applies a holistic alignment loss that simultaneously distills attention maps (relational priors) and feature projections (semantic anchors) from the teacher into mid-level layers of the DiT, yielding rapid convergence. Phase II then performs one-shot termination that deactivates the alignment loss, once a simple trigger such as a fixed iteration is hit, freeing the DiT to focus on denoising and exploit its generative capacity. HASTE speeds up training of diverse DiTs without architecture changes. On ImageNet 256×256, it reaches the vanilla SiT-XL/2 baseline FID in 50 epochs and matches REPA’s best FID in 500 epochs, amounting to a 28× reduction in optimization steps. HASTE also improves text-to-image DiTs on MS-COCO, proving to be a simple yet principled recipe for efficient diffusion training across various tasks.
Cite
Text
Wang et al. "REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training." Advances in Neural Information Processing Systems, 2025.Markdown
[Wang et al. "REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wang2025neurips-repa/)BibTeX
@inproceedings{wang2025neurips-repa,
title = {{REPA Works Until It Doesn’t: Early-Stopped, Holistic Alignment Supercharges Diffusion Training}},
author = {Wang, Ziqiao and Zhao, Wangbo and Zhou, Yuhao and Li, Zekai and Liang, Zhiyuan and Shi, Mingjia and Zhao, Xuanlei and Zhou, Pengfei and Zhang, Kaipeng and Wang, Zhangyang and Wang, Kai and You, Yang},
booktitle = {Advances in Neural Information Processing Systems},
year = {2025},
url = {https://mlanthology.org/neurips/2025/wang2025neurips-repa/}
}