SyncTalk: The Devil Is in the Synchronization for Talking Head Synthesis

Abstract

Achieving high synchronization in the synthesis of realistic speech-driven talking head videos presents a significant challenge. Traditional Generative Adversarial Networks (GAN) struggle to maintain consistent facial identity while Neural Radiance Fields (NeRF) methods although they can address this issue often produce mismatched lip movements inadequate facial expressions and unstable head poses. A lifelike talking head requires synchronized coordination of subject identity lip movements facial expressions and head poses. The absence of these synchronizations is a fundamental flaw leading to unrealistic and artificial outcomes. To address the critical issue of synchronization identified as the "devil" in creating realistic talking heads we introduce SyncTalk. This NeRF-based method effectively maintains subject identity enhancing synchronization and realism in talking head synthesis. SyncTalk employs a Face-Sync Controller to align lip movements with speech and innovatively uses a 3D facial blendshape model to capture accurate facial expressions. Our HeadSync Stabilizer optimizes head poses achieving more natural head movements. The Portrait-Sync Generator restores hair details and blends the generated head with the torso for a seamless visual experience. Extensive experiments and user studies demonstrate that SyncTalk outperforms state-of-the-art methods in synchronization and realism. We recommend watching the supplementary video: https://ziqiaopeng.github.io/synctalk

Cite

Text

Peng et al. "SyncTalk: The Devil Is in the Synchronization for Talking Head Synthesis." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.00070

Markdown

[Peng et al. "SyncTalk: The Devil Is in the Synchronization for Talking Head Synthesis." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/peng2024cvpr-synctalk/) doi:10.1109/CVPR52733.2024.00070

BibTeX

@inproceedings{peng2024cvpr-synctalk,
  title     = {{SyncTalk: The Devil Is in the Synchronization for Talking Head Synthesis}},
  author    = {Peng, Ziqiao and Hu, Wentao and Shi, Yue and Zhu, Xiangyu and Zhang, Xiaomei and Zhao, Hao and He, Jun and Liu, Hongyan and Fan, Zhaoxin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {666-676},
  doi       = {10.1109/CVPR52733.2024.00070},
  url       = {https://mlanthology.org/cvpr/2024/peng2024cvpr-synctalk/}
}