Dual-Branch Collaborative Transformer for Virtual Try-on
Abstract
Image-based virtual try-on has recently gained a lot of attention in both the scientific and fashion industry communities due to its challenging setting and practical real-world applications. While pure convolutional approaches have been explored to solve the task, Transformer-based architectures have not received significant attention yet. Following the intuition that self- and cross-attention operators can deal with long-range dependencies and hence improve the generation, in this paper we extend a Transformer-based virtual try-on model by adding a dual-branch collaborative module that can exploit cross-modal information at generation time. We perform experiments on the VITON dataset, which is the standard benchmark for the task, and on a recently collected virtual try-on dataset with multi-category clothing, Dress Code. Experimental results demonstrate the effectiveness of our solution over previous methods and show that Transformer-based architectures can be a viable alternative for virtual try-on.
Cite
Text
Fenocchi et al. "Dual-Branch Collaborative Transformer for Virtual Try-on." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00246Markdown
[Fenocchi et al. "Dual-Branch Collaborative Transformer for Virtual Try-on." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/fenocchi2022cvprw-dualbranch/) doi:10.1109/CVPRW56347.2022.00246BibTeX
@inproceedings{fenocchi2022cvprw-dualbranch,
title = {{Dual-Branch Collaborative Transformer for Virtual Try-on}},
author = {Fenocchi, Emanuele and Morelli, Davide and Cornia, Marcella and Baraldi, Lorenzo and Cesari, Fabio and Cucchiara, Rita},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2022},
pages = {2246-2250},
doi = {10.1109/CVPRW56347.2022.00246},
url = {https://mlanthology.org/cvprw/2022/fenocchi2022cvprw-dualbranch/}
}