Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer
Abstract
Diffusion models have shown great promise in text-guided image style transfer, but there is a trade-off between style transformation and content preservation due to their stochastic nature. Existing methods require computationally expensive fine-tuning of diffusion models or additional neural network. To address this, here we propose a zero-shot contrastive loss for diffusion models that doesn't require additional fine-tuning or auxiliary networks. By leveraging patch-wise contrastive loss between generated samples and original image embeddings in the pre-trained diffusion model, our method can generate images with the same semantic content as the source image in a zero-shot manner. Our approach outperforms existing methods while preserving content and requiring no additional training, not only for image style transfer but also for image-to-image translation and manipulation. Our experimental results validate the effectiveness of our proposed method.
Cite
Text
Yang et al. "Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer." International Conference on Computer Vision, 2023. doi:10.1109/ICCV51070.2023.02091Markdown
[Yang et al. "Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer." International Conference on Computer Vision, 2023.](https://mlanthology.org/iccv/2023/yang2023iccv-zeroshot/) doi:10.1109/ICCV51070.2023.02091BibTeX
@inproceedings{yang2023iccv-zeroshot,
title = {{Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer}},
author = {Yang, Serin and Hwang, Hyunmin and Ye, Jong Chul},
booktitle = {International Conference on Computer Vision},
year = {2023},
pages = {22873-22882},
doi = {10.1109/ICCV51070.2023.02091},
url = {https://mlanthology.org/iccv/2023/yang2023iccv-zeroshot/}
}