StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-Trained StyleGAN

Yin, Fei; Zhang, Yong; Cun, Xiaodong; Cao, Mingdeng; Fan, Yanbo; Wang, Xuan; Bai, Qingyan; Wu, Baoyuan; Wang, Jue; Yang, Yujiu

doi:10.1007/978-3-031-19790-1_6

StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-Trained StyleGAN

Fei Yin, Yong Zhang, Xiaodong Cun, Mingdeng Cao, Yanbo Fan, Xuan Wang, Qingyan Bai, Baoyuan Wu, Jue Wang, Yujiu Yang

ECCV 2022

doi:10.1007/978-3-031-19790-1_6 /eccv/2022/yin2022eccv-styleheat/

Abstract

One-shot talking face generation aims at synthesizing a high-quality talking face video from an arbitrary portrait image, driven by a video or an audio segment. In this work, we provide a solution from a novel perspective that differs from existing frameworks. We first investigate the latent feature space of a pre-trained StyleGAN and discover some excellent spatial transformation properties. Upon the observation, we propose a novel unified framework based on a pre-trained StyleGAN that enables a set of powerful functionalities, i.e., high-resolution video generation, disentangled control by driving video or audio, and flexible face editing. Our framework elevates the resolution of the synthesized talking face to 1024×1024 for the first time, even though the training dataset has a lower resolution. Moreover, our framework allows two types of facial editing, i.e., global editing via GAN inversion and intuitive editing via 3D morphable models. Comprehensive experiments show superior video quality and flexible controllability over state-of-the-art methods. Code is available at https://github.com/FeiiYin/StyleHEAT.

PDF ECCV Semantic Scholar

Cite

Text

Yin et al. "StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-Trained StyleGAN." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-19790-1_6

Markdown

[Yin et al. "StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-Trained StyleGAN." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/yin2022eccv-styleheat/) doi:10.1007/978-3-031-19790-1_6

BibTeX

@inproceedings{yin2022eccv-styleheat,
  title     = {{StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-Trained StyleGAN}},
  author    = {Yin, Fei and Zhang, Yong and Cun, Xiaodong and Cao, Mingdeng and Fan, Yanbo and Wang, Xuan and Bai, Qingyan and Wu, Baoyuan and Wang, Jue and Yang, Yujiu},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-19790-1_6},
  url       = {https://mlanthology.org/eccv/2022/yin2022eccv-styleheat/}
}