Representation Alignment for Diffusion Transformers Without External Components
Abstract
Recent studies have demonstrated that learning a meaningful internal represen- tation can accelerate generative training. However, existing approaches necessi- tate to either introduce an off-the-shelf external representation task or rely on a large-scale, pre-trained external representation encoder to provide representation guidance during the training process. In this study, we posit that the unique dis- criminative process inherent to diffusion transformers enables them to offer such guidance without requiring external representation components. We propose Self- Representation Alignment (SRA), a simple yet effective method that obtains rep- resentation guidance using the internal representations of learned diffusion trans- former. SRA aligns the latent representation of the diffusion transformer in the earlier layer conditioned on higher noise to that in the later layer conditioned on lower noise to progressively enhance the overall representation learning during only the training process. Experimental results indicate that applying SRA to DiTs and SiTs yields consistent performance improvements, and largely outper- forms approaches relying on auxiliary representation task. Our approach achieves performance comparable to methods that are dependent on an external pre-trained representation encoder, which demonstrates the feasibility of acceleration with representation alignment in diffusion transformers themselves.
Cite
Text
Jiang et al. "Representation Alignment for Diffusion Transformers Without External Components." International Conference on Learning Representations, 2026.Markdown
[Jiang et al. "Representation Alignment for Diffusion Transformers Without External Components." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/jiang2026iclr-representation/)BibTeX
@inproceedings{jiang2026iclr-representation,
title = {{Representation Alignment for Diffusion Transformers Without External Components}},
author = {Jiang, Dengyang and Wang, Mengmeng and Li, Liuzhuozheng and Zhang, Lei and Wang, Haoyu and Wei, Wei and Dai, Guang and Zhang, Yanning and Wang, Jingdong},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/jiang2026iclr-representation/}
}