Improving Diffusion Models for Authentic Virtual Try-on in the Wild
Abstract
This paper considers image-based virtual try-on, which renders an image of a person wearing a curated garment, given a pair of images depicting the person and the garment, respectively. Previous works adapt existing exemplar-based inpainting diffusion models for virtual try-on to improve the naturalness of the generated visuals compared to other methods (e.g., GAN-based), but they fail to preserve the identity of the garments. To overcome this limitation, we propose a novel diffusion model that improves garment fidelity and generates authentic virtual try-on images. Our method, coined , uses two different modules to encode the semantics of garment image; given the base UNet of the diffusion model, 1) the high-level semantics extracted from a visual encoder are fused to the cross-attention layer, and then 2) the low-level features extracted from parallel UNet are fused to the self-attention layer. In addition, we provide detailed textual prompts for both garment and person images to enhance the authenticity of the generated visuals. Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Our experimental results show that our method outperforms previous approaches (both diffusion-based and GAN-based) in preserving garment details and generating authentic virtual try-on images, both qualitatively and quantitatively. Furthermore, the proposed customization method demonstrates its effectiveness in a real-world scenario. More visualizations are available in our project page.
Cite
Text
Choi et al. "Improving Diffusion Models for Authentic Virtual Try-on in the Wild." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73016-0_13Markdown
[Choi et al. "Improving Diffusion Models for Authentic Virtual Try-on in the Wild." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/choi2024eccv-improving/) doi:10.1007/978-3-031-73016-0_13BibTeX
@inproceedings{choi2024eccv-improving,
title = {{Improving Diffusion Models for Authentic Virtual Try-on in the Wild}},
author = {Choi, Yisol and Kwak, Sangkyung and Lee, Kyungmin and Choi, Hyungwon and Shin, Jinwoo},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2024},
doi = {10.1007/978-3-031-73016-0_13},
url = {https://mlanthology.org/eccv/2024/choi2024eccv-improving/}
}