DiT4Edit: Diffusion Transformer for Image Editing

Feng, Kunyu; Ma, Yue; Wang, Bingyuan; Qi, Chenyang; Chen, Haozhe; Chen, Qifeng; Wang, Zeyu

doi:10.1609/AAAI.V39I3.32304

DiT4Edit: Diffusion Transformer for Image Editing

Kunyu Feng, Yue Ma, Bingyuan Wang, Chenyang Qi, Haozhe Chen, Qifeng Chen, Zeyu Wang

AAAI 2025 pp. 2969-2977

doi:10.1609/AAAI.V39I3.32304 /aaai/2025/feng2025aaai-dit/

Abstract

Despite recent advances in UNet-based image editing, methods for shape-aware object editing in high-resolution images are still lacking. Compared to UNet, Diffusion Transformers (DiT) demonstrate superior capabilities to effectively capture the long-range dependencies among patches, leading to higher-quality image generation. In this paper, we propose DiT4Edit, the first Diffusion Transformer-based image editing framework. Specifically, DiT4Edit uses the DPM-Solver inversion algorithm to obtain the inverted latents, reducing the number of steps compared to the DDIM inversion algorithm commonly used in UNet-based frameworks. Additionally, we design unified attention control and patch merging, tailored for transformer computation streams. This integration allows our framework to generate higher-quality edited images faster. Our design leverages the advantages of DiT, enabling it to surpass UNet structures in image editing, especially in high-resolution and arbitrary-size images. Extensive experiments demonstrate the strong performance of DiT4Edit in various editing scenarios, highlighting the potential of diffusion transformers for image editing.

PDF AAAI Semantic Scholar

Cite

Text

Feng et al. "DiT4Edit: Diffusion Transformer for Image Editing." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I3.32304

Markdown

[Feng et al. "DiT4Edit: Diffusion Transformer for Image Editing." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/feng2025aaai-dit/) doi:10.1609/AAAI.V39I3.32304

BibTeX

@inproceedings{feng2025aaai-dit,
  title     = {{DiT4Edit: Diffusion Transformer for Image Editing}},
  author    = {Feng, Kunyu and Ma, Yue and Wang, Bingyuan and Qi, Chenyang and Chen, Haozhe and Chen, Qifeng and Wang, Zeyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {2969-2977},
  doi       = {10.1609/AAAI.V39I3.32304},
  url       = {https://mlanthology.org/aaai/2025/feng2025aaai-dit/}
}