Effective Diffusion Transformer Architecture for Image Super-Resolution

Cheng, Kun; Yu, Lei; Tu, Zhijun; He, Xiao; Chen, Liyu; Guo, Yong; Zhu, Mingrui; Wang, Nannan; Gao, Xinbo; Hu, Jie

doi:10.1609/AAAI.V39I3.32247

Effective Diffusion Transformer Architecture for Image Super-Resolution

Kun Cheng, Lei Yu, Zhijun Tu, Xiao He, Liyu Chen, Yong Guo, Mingrui Zhu, Nannan Wang, Xinbo Gao, Jie Hu

AAAI 2025 pp. 2455-2463

doi:10.1609/AAAI.V39I3.32247 /aaai/2025/cheng2025aaai-effective/

Abstract

Recent advances indicate that diffusion model holds great promise in image super-resolution. While latest methods are primarily based on latent diffusion models with convolutional neural networks, there are few attempts to explore transformers, which have demonstrated remarkable performance in image generation. In this work, we design an effective diffusion transformer for image super resolution (DiT-SR) that achieves the visual quality of prior-based methods, but through a training-from-scratch manner. In practice, DiT-SR leverages an overall U-shaped architecture, and adopts uniform isotropic design for all the transformer blocks across different stages. The former facilitates multi-scale hierarchical feature extraction, while the latter reallocate the computational resources to critical layers to further enhance performance. Moreover, we thoroughly analyze the limitation of the widely used AdaLN, and present a frequency-adaptive time-step conditioning module, enhancing the model's capacity to process distinct frequency information at different time steps. Extensive experiments demonstrate that DiT-SR outperforms the existing training-from-scratch diffusion-based SR methods significantly, and even beats some of the prior-based methods on pretrained Stable Diffusion, proving the superiority of diffusion transformer in image super resolution.

PDF AAAI Semantic Scholar

Cite

Text

Cheng et al. "Effective Diffusion Transformer Architecture for Image Super-Resolution." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32247

Markdown

[Cheng et al. "Effective Diffusion Transformer Architecture for Image Super-Resolution." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/cheng2025aaai-effective/) doi:10.1609/AAAI.V39I3.32247

BibTeX

@inproceedings{cheng2025aaai-effective,
  title     = {{Effective Diffusion Transformer Architecture for Image Super-Resolution}},
  author    = {Cheng, Kun and Yu, Lei and Tu, Zhijun and He, Xiao and Chen, Liyu and Guo, Yong and Zhu, Mingrui and Wang, Nannan and Gao, Xinbo and Hu, Jie},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2455-2463},
  doi       = {10.1609/AAAI.V39I3.32247},
  url       = {https://mlanthology.org/aaai/2025/cheng2025aaai-effective/}
}