Dual-Path Condition Alignment for Diffusion Transformers
Abstract
Denoising-based generative models have been significantly advanced by representation-alignment (REPA) loss, which leverages pre-trained visual encoders to guide intermediate network features. However, REPA's reliance on external visual encoders introduces two critical challenges: potential \textit{distribution mismatches} between the encoder's training data and the generation target, and the high \textit{computational costs} of pre-training. Inspired by the observation that REPA primarily aids early layers in capturing robust semantics, we propose an unsupervised alternative that avoids external visual encoder and the assumption of consistent data distribution. We introduce \textit{\textbf{DU}al-\textbf{P}ath condition \textbf{A}lignment} (\textbf{DUPA}), a novel self-alignment framework, which independently noises an image multiple times and processes these noisy latents through decoupled diffusion transformer, then aligns the derived conditions\textemdash low-frequency semantic features extracted from each path. Experiments demonstrate that DUPA achieves FID$=$1.46 on ImageNet 256$\times$256 with only 400 training epochs, outperforming all methods that do not rely on external supervision. DUPA is also model-agnostic and can be readily applied to any denoising-based generative model, showcasing its excellent scalability and generalizability. Code is available at https://github.com/PCH-gg/DUPA, https://openi.pcl.ac.cn/OpenAIDriving/DUPA.
Cite
Text
Peng et al. "Dual-Path Condition Alignment for Diffusion Transformers." International Conference on Learning Representations, 2026.Markdown
[Peng et al. "Dual-Path Condition Alignment for Diffusion Transformers." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/peng2026iclr-dualpath/)BibTeX
@inproceedings{peng2026iclr-dualpath,
title = {{Dual-Path Condition Alignment for Diffusion Transformers}},
author = {Peng, Changhao and Ye, Yuqi and Du, Shuangjun and Gao, Wenxu and Gao, Wei},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/peng2026iclr-dualpath/}
}