M3D-VTON: A Monocular-to-3D Virtual Try-on Network

Abstract

Virtual 3D try-on can provide an intuitive and realistic view for online shopping and has a huge potential commercial value. However, existing 3D virtual try-on methods mainly rely on annotated 3D human shapes and garment templates, which hinders their applications in practical scenarios. 2D virtual try-on approaches provide a faster alternative to manipulate clothed humans, but lack the rich and realistic 3D representation. In this paper, we propose a novel Monocular-to-3D Virtual Try-On Network (M3D-VTON) that builds on the merits of both 2D and 3D approaches. By integrating 2D information efficiently and learning a mapping that lifts the 2D representation to 3D, we make the first attempt to reconstruct a 3D try-on mesh only taking the target clothing and a person image as inputs. The proposed M3D-VTON includes three modules: 1) The Monocular Prediction Module (MPM) that estimates an initial full-body depth map and accomplishes 2D clothes-person alignment through a novel two-stage warping procedure; 2) The Depth Refinement Module (DRM) that refines the initial body depth to produce more detailed pleat and face characteristics; 3) The Texture Fusion Module (TFM) that fuses the warped clothing with the non-target body part to refine the results. We also construct a high-quality synthesized Monocular-to-3D virtual try-on dataset, in which each person image is associated with a front and a back depth map. Extensive experiments demonstrate that the proposed M3D-VTON can manipulate and reconstruct the 3D human body wearing the given clothing with compelling details and is more efficient than other 3D approaches.

Cite

Text

Zhao et al. "M3D-VTON: A Monocular-to-3D Virtual Try-on Network." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.01299

Markdown

[Zhao et al. "M3D-VTON: A Monocular-to-3D Virtual Try-on Network." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/zhao2021iccv-m3dvton/) doi:10.1109/ICCV48922.2021.01299

BibTeX

@inproceedings{zhao2021iccv-m3dvton,
  title     = {{M3D-VTON: A Monocular-to-3D Virtual Try-on Network}},
  author    = {Zhao, Fuwei and Xie, Zhenyu and Kampffmeyer, Michael and Dong, Haoye and Han, Songfang and Zheng, Tianxiang and Zhang, Tao and Liang, Xiaodan},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {13239-13249},
  doi       = {10.1109/ICCV48922.2021.01299},
  url       = {https://mlanthology.org/iccv/2021/zhao2021iccv-m3dvton/}
}