LEDiT: Your Length-Extrapolatable Diffusion Transformer Without Positional Encoding

Zhang, Shen; Liang, Siyuan; Tan, Yaning; Chen, Zhaowei; Li, Linze; Wu, Ge; Chen, Yuhao; Li, Shuheng; Zhao, Zhenyu; Chen, Caihua; Liang, Jiajun; Tang, Yao

LEDiT: Your Length-Extrapolatable Diffusion Transformer Without Positional Encoding

Shen Zhang, Siyuan Liang, Yaning Tan, Zhaowei Chen, Linze Li, Ge Wu, Yuhao Chen, Shuheng Li, Zhenyu Zhao, Caihua Chen, Jiajun Liang, Yao Tang

NeurIPS 2025

/neurips/2025/zhang2025neurips-ledit/

Abstract

Diffusion transformers (DiTs) struggle to generate images at resolutions higher than their training resolutions. The primary obstacle is that the explicit positional encodings (PE), such as RoPE, need extrapolating to unseen positions which degrades performance when the inference resolution differs from training. In this paper, We propose a Length-Extrapolatable Diffusion Transformer (LEDiT) to overcome this limitation. LEDiT needs no explicit PEs, thereby avoiding PE extrapolation. The key innovation of LEDiT lies in the use of causal attention. We demonstrate that causal attention can implicitly encode global positional information and show that such information facilitates extrapolation. We further introduce a locality enhancement module, which captures fine-grained local information to complement the global coarse-grained position information encoded by causal attention. Experimental results on both conditional and text-to-image generation tasks demonstrate that LEDiT supports up to 4× resolution scaling (e.g., from 256$\times$256 to 512$\times$512), achieving better image quality compared to the state-of-the-art length extrapolation methods. We believe that LEDiT marks a departure from the standard RoPE-based methods and offers a promising insight into length extrapolation. Project page: https://shenzhang2145.github.io/ledit/

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Zhang et al. "LEDiT: Your Length-Extrapolatable Diffusion Transformer Without Positional Encoding." Advances in Neural Information Processing Systems, 2025.

Markdown

[Zhang et al. "LEDiT: Your Length-Extrapolatable Diffusion Transformer Without Positional Encoding." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/zhang2025neurips-ledit/)

BibTeX

@inproceedings{zhang2025neurips-ledit,
  title     = {{LEDiT: Your Length-Extrapolatable Diffusion Transformer Without Positional Encoding}},
  author    = {Zhang, Shen and Liang, Siyuan and Tan, Yaning and Chen, Zhaowei and Li, Linze and Wu, Ge and Chen, Yuhao and Li, Shuheng and Zhao, Zhenyu and Chen, Caihua and Liang, Jiajun and Tang, Yao},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/zhang2025neurips-ledit/}
}