Z*: Zero-Shot Style Transfer via Attention Reweighting
Abstract
Despite the remarkable progress in image style transfer formulating style in the context of art is inherently subjective and challenging. In contrast to existing methods this study shows that vanilla diffusion models can directly extract style information and seamlessly integrate the generative prior into the content image without retraining. Specifically we adopt dual denoising paths to represent content/style references in latent space and then guide the content image denoising process with style latent codes. We further reveal that the cross-attention mechanism in latent diffusion models tends to blend the content and style images resulting in stylized outputs that deviate from the original content image. To overcome this limitation we introduce a cross-attention reweighting strategy. Through theoretical analysis and experiments we demonstrate the effectiveness and superiority of the diffusion-based zero-shot style transfer via attention reweighting Z-STAR.
Cite
Text
Deng et al. "Z*: Zero-Shot Style Transfer via Attention Reweighting." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00662Markdown
[Deng et al. "Z*: Zero-Shot Style Transfer via Attention Reweighting." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/deng2024cvpr-zeroshot/) doi:10.1109/CVPR52733.2024.00662BibTeX
@inproceedings{deng2024cvpr-zeroshot,
title = {{Z*: Zero-Shot Style Transfer via Attention Reweighting}},
author = {Deng, Yingying and He, Xiangyu and Tang, Fan and Dong, Weiming},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2024},
pages = {6934-6944},
doi = {10.1109/CVPR52733.2024.00662},
url = {https://mlanthology.org/cvpr/2024/deng2024cvpr-zeroshot/}
}