ClothFormer: Taming Video Virtual Try-on in All Module
Abstract
The task of video virtual try-on aims to fit the target clothes to a person in the video with spatio-temporal consistency. Despite tremendous progress of image virtual try-on, they lead to inconsistency between frames when applied to videos. Limited work also explored the task of video-based virtual try-on but failed to produce visually pleasing and temporally coherent results. Moreover, there are two other key challenges: 1) how to generate accurate warping when occlusions appear in the clothing region; 2) how to generate clothes and non-target body parts (e.g. arms, neck) in harmony with the complicated background; To address them, we propose a novel video virtual try-on framework, ClothFormer, which successfully synthesizes realistic, harmonious, and spatio-temporal consistent results in complicated environment. In particular, ClothFormer involves three major modules. First, a two-stage anti-occlusion warping module that predicts an accurate dense flow mapping between the body regions and the clothing regions. Second, an appearance-flow tracking module utilizes ridge regression and optical flow correction to smooth the dense flow sequence and generate a temporally smooth warped clothing sequence. Third, a dual-stream transformer extracts and fuses clothing textures, person features, and environment information to generate realistic try-on videos. Through rigorous experiments, we demonstrate that our method highly surpasses the baselines in terms of synthesized video quality both qualitatively and quantitatively.
Cite
Text
Jiang et al. "ClothFormer: Taming Video Virtual Try-on in All Module." Conference on Computer Vision and Pattern Recognition, 2022. doi:10.1109/CVPR52688.2022.01053Markdown
[Jiang et al. "ClothFormer: Taming Video Virtual Try-on in All Module." Conference on Computer Vision and Pattern Recognition, 2022.](https://mlanthology.org/cvpr/2022/jiang2022cvpr-clothformer/) doi:10.1109/CVPR52688.2022.01053BibTeX
@inproceedings{jiang2022cvpr-clothformer,
title = {{ClothFormer: Taming Video Virtual Try-on in All Module}},
author = {Jiang, Jianbin and Wang, Tan and Yan, He and Liu, Junhui},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2022},
pages = {10799-10808},
doi = {10.1109/CVPR52688.2022.01053},
url = {https://mlanthology.org/cvpr/2022/jiang2022cvpr-clothformer/}
}