DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations

Abstract

The diffusion-based text-to-image model harbors immense potential in transferring reference style. However current encoder-based approaches significantly impair the text controllability of text-to-image models while transferring styles. In this paper we introduce DEADiff to address this issue using the following two strategies: 1) a mechanism to decouple the style and semantics of reference images. The decoupled feature representations are first extracted by Q-Formers which are instructed by different text descriptions. Then they are injected into mutually exclusive subsets of cross-attention layers for better disentanglement. 2) A non-reconstructive learning method. The Q-Formers are trained using paired images rather than the identical target in which the reference image and the ground-truth image are with the same style or semantics. We show that DEADiff attains the best visual stylization results and optimal balance between the text controllability inherent in the text-to-image model and style similarity to the reference image as demonstrated both quantitatively and qualitatively. Our project page is https://tianhao-qi.github.io/DEADiff/.

Cite

Text

Qi et al. "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00830

Markdown

[Qi et al. "DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/qi2024cvpr-deadiff/) doi:10.1109/CVPR52733.2024.00830

BibTeX

@inproceedings{qi2024cvpr-deadiff,
  title     = {{DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations}},
  author    = {Qi, Tianhao and Fang, Shancheng and Wu, Yanze and Xie, Hongtao and Liu, Jiawei and Chen, Lang and He, Qian and Zhang, Yongdong},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {8693-8702},
  doi       = {10.1109/CVPR52733.2024.00830},
  url       = {https://mlanthology.org/cvpr/2024/qi2024cvpr-deadiff/}
}