FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization

Abstract

Generating emotional talking faces is a practical yet challenging endeavor. To create a lifelike avatar we draw upon two critical insights from a human perspective: 1) The connection between audio and the non-deterministic facial dynamics encompassing expressions blinks poses should exhibit synchronous and one-to-many mapping. 2) Vibrant expressions are often accompanied by emotion-aware high-definition (HD) textures and finely detailed teeth. However both aspects are frequently overlooked by existing methods. To this end this paper proposes using normalizing Flow and Vector-Quantization modeling to produce emotional talking faces that satisfy both insights concurrently (FlowVQTalker). Specifically we develop a flowbased coefficient generator that encodes the dynamics of facial emotion into a multi-emotion-class latent space represented as a mixture distribution. The generation process commences with random sampling from the modeled distribution guided by the accompanying audio enabling both lip-synchronization and the uncertain nonverbal facial cues generation. Furthermore our designed vector-quantization image generator treats the creation of expressive facial images as a code query task utilizing a learned codebook to provide rich high-quality textures that enhance the emotional perception of the results. Extensive experiments are conducted to showcase the effectiveness of our approach.

Cite

Text

Tan et al. "FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.02486

Markdown

[Tan et al. "FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/tan2024cvpr-flowvqtalker/) doi:10.1109/CVPR52733.2024.02486

BibTeX

@inproceedings{tan2024cvpr-flowvqtalker,
  title     = {{FlowVQTalker: High-Quality Emotional Talking Face Generation Through Normalizing Flow and Quantization}},
  author    = {Tan, Shuai and Ji, Bin and Pan, Ye},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {26317-26327},
  doi       = {10.1109/CVPR52733.2024.02486},
  url       = {https://mlanthology.org/cvpr/2024/tan2024cvpr-flowvqtalker/}
}