TryOnDiffusion: A Tale of Two UNets

Luyang Zhu, Dawei Yang, Tyler Zhu, Fitsum Reda, William Chan, Chitwan Saharia, Mohammad Norouzi, Ira Kemelmacher-Shlizerman

CVPR 2023 pp. 4606-4615

doi:10.1109/CVPR52729.2023.00447 /cvpr/2023/zhu2023cvpr-tryondiffusion/

Abstract

Given two images depicting a person and a garment worn by another person, our goal is to generate a visualization of how the garment might look on the input person. A key challenge is to synthesize a photorealistic detail-preserving visualization of the garment, while warping the garment to accommodate a significant body pose and shape change across the subjects. Previous methods either focus on garment detail preservation without effective pose and shape variation, or allow try-on with the desired shape and pose but lack garment details. In this paper, we propose a diffusion-based architecture that unifies two UNets (referred to as Parallel-UNet), which allows us to preserve garment details and warp the garment for significant pose and body change in a single network. The key ideas behind Parallel-UNet include: 1) garment is warped implicitly via a cross attention mechanism, 2) garment warp and person blend happen as part of a unified process as opposed to a sequence of two separate tasks. Experimental results indicate that TryOnDiffusion achieves state-of-the-art performance both qualitatively and quantitatively.

PDF CVPR Semantic Scholar

Cite

Text

Zhu et al. "TryOnDiffusion: A Tale of Two UNets." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00447

Markdown

[Zhu et al. "TryOnDiffusion: A Tale of Two UNets." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/zhu2023cvpr-tryondiffusion/) doi:10.1109/CVPR52729.2023.00447

BibTeX

@inproceedings{zhu2023cvpr-tryondiffusion,
  title     = {{TryOnDiffusion: A Tale of Two UNets}},
  author    = {Zhu, Luyang and Yang, Dawei and Zhu, Tyler and Reda, Fitsum and Chan, William and Saharia, Chitwan and Norouzi, Mohammad and Kemelmacher-Shlizerman, Ira},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {4606-4615},
  doi       = {10.1109/CVPR52729.2023.00447},
  url       = {https://mlanthology.org/cvpr/2023/zhu2023cvpr-tryondiffusion/}
}