OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-on

Abstract

We present OOTDiffusion, a novel network architecture for realistic and controllable image-based virtual try-on (VTON). We leverage the power of pretrained latent diffusion models, designing an outfitting UNet to learn the detailed garment features. Without a redundant warping process, the garment features are precisely aligned with the target human body via the proposed outfitting fusion in the self-attention layers of the denoising UNet. In order to further enhance the controllability, we introduce outfitting dropout to the training process, which enables us to adjust the strength of the garment features through classifier-free guidance. Our comprehensive experiments on the VITON-HD and Dress Code datasets demonstrate that OOTDiffusion efficiently generates high-quality try-on results for arbitrary human and garment images, which outperforms other VTON methods in both realism and controllability, indicating a breakthrough in virtual try-on.

Cite

Text

Xu et al. "OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-on." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I9.32973

Markdown

[Xu et al. "OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-on." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/xu2025aaai-ootdiffusion/) doi:10.1609/AAAI.V39I9.32973

BibTeX

@inproceedings{xu2025aaai-ootdiffusion,
  title     = {{OOTDiffusion: Outfitting Fusion Based Latent Diffusion for Controllable Virtual Try-on}},
  author    = {Xu, Yuhao and Gu, Tao and Chen, Weifeng and Chen, Arlene},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8996-9004},
  doi       = {10.1609/AAAI.V39I9.32973},
  url       = {https://mlanthology.org/aaai/2025/xu2025aaai-ootdiffusion/}
}