Deformable One-Shot Face Stylization via DINO Semantic Guidance

Zhou, Yang; Chen, Zichong; Huang, Hui

doi:10.1109/CVPR52733.2024.00744

Deformable One-Shot Face Stylization via DINO Semantic Guidance

Yang Zhou, Zichong Chen, Hui Huang

CVPR 2024 pp. 7787-7796

doi:10.1109/CVPR52733.2024.00744 /cvpr/2024/zhou2024cvpr-deformable/

Abstract

This paper addresses the complex issue of one-shot face stylization focusing on the simultaneous consideration of appearance and structure where previous methods have fallen short. We explore deformation-aware face stylization that diverges from traditional single-image style reference opting for a real-style image pair instead. The cornerstone of our method is the utilization of a self-supervised vision transformer specifically DINO-ViT to establish a robust and consistent facial structure representation across both real and style domains. Our stylization process begins by adapting the StyleGAN generator to be deformation-aware through the integration of spatial transformers (STN). We then introduce two innovative constraints for generator fine-tuning under the guidance of DINO semantics: i) a directional deformation loss that regulates directional vectors in DINO space and ii) a relative structural consistency constraint based on DINO token self-similarities ensuring diverse generation. Additionally style-mixing is employed to align the color generation with the reference minimizing inconsistent correspondences. This framework delivers enhanced deformability for general one-shot face stylization achieving notable efficiency with a fine-tuning duration of approximately 10 minutes. Extensive qualitative and quantitative comparisons demonstrate our superiority over state-of-the-art one-shot face stylization methods. Code is available at https://github.com/zichongc/DoesFS

PDF CVPR Semantic Scholar

Cite

Text

Zhou et al. "Deformable One-Shot Face Stylization via DINO Semantic Guidance." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00744

Markdown

[Zhou et al. "Deformable One-Shot Face Stylization via DINO Semantic Guidance." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/zhou2024cvpr-deformable/) doi:10.1109/CVPR52733.2024.00744

BibTeX

@inproceedings{zhou2024cvpr-deformable,
  title     = {{Deformable One-Shot Face Stylization via DINO Semantic Guidance}},
  author    = {Zhou, Yang and Chen, Zichong and Huang, Hui},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {7787-7796},
  doi       = {10.1109/CVPR52733.2024.00744},
  url       = {https://mlanthology.org/cvpr/2024/zhou2024cvpr-deformable/}
}