FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization

Tan, Shuai; Ji, Bin; Pan, Ye

doi:10.1109/CVPR52733.2024.02486

FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization

Shuai Tan, Bin Ji, Ye Pan

CVPR 2024 pp. 26317-26327

doi:10.1109/CVPR52733.2024.02486 /cvpr/2024/tan2024cvpr-flowvqtalker/

Abstract

Generating emotional talking faces is a practical yet challenging endeavor. To create a lifelike avatar we draw upon two critical insights from a human perspective: 1) The connection between audio and the non-deterministic facial dynamics encompassing expressions blinks poses should exhibit synchronous and one-to-many mapping. 2) Vibrant expressions are often accompanied by emotion-aware high-definition (HD) textures and finely detailed teeth. However both aspects are frequently overlooked by existing methods. To this end this paper proposes using normalizing Flow and Vector-Quantization modeling to produce emotional talking faces that satisfy both insights concurrently (FlowVQTalker). Specifically we develop a flowbased coefficient generator that encodes the dynamics of facial emotion into a multi-emotion-class latent space represented as a mixture distribution. The generation process commences with random sampling from the modeled distribution guided by the accompanying audio enabling both lip-synchronization and the uncertain nonverbal facial cues generation. Furthermore our designed vector-quantization image generator treats the creation of expressive facial images as a code query task utilizing a learned codebook to provide rich high-quality textures that enhance the emotional perception of the results. Extensive experiments are conducted to showcase the effectiveness of our approach.

PDF CVPR Semantic Scholar

Cite

Text

Tan et al. "FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02486

Markdown

[Tan et al. "FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/tan2024cvpr-flowvqtalker/) doi:10.1109/CVPR52733.2024.02486

BibTeX

@inproceedings{tan2024cvpr-flowvqtalker,
  title     = {{FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization}},
  author    = {Tan, Shuai and Ji, Bin and Pan, Ye},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {26317-26327},
  doi       = {10.1109/CVPR52733.2024.02486},
  url       = {https://mlanthology.org/cvpr/2024/tan2024cvpr-flowvqtalker/}
}