Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

Pham, Hai Xuan; Cheung, Samuel; Pavlovic, Vladimir

doi:10.1109/CVPRW.2017.287

Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

Hai Xuan Pham, Samuel Cheung, Vladimir Pavlovic

CVPRW 2017 pp. 2328-2336

doi:10.1109/CVPRW.2017.287 /cvprw/2017/pham2017cvprw-speechdriven/

Abstract

We introduce a long short-term memory recurrent neural network (LSTM-RNN) approach for real-time facial animation, which automatically estimates head rotation and facial action unit activations of a speaker from just her speech. Specifically, the time-varying contextual non-linear mapping between audio stream and visual facial movements is realized by training a LSTM neural network on a large audio-visual data corpus. In this work, we extract a set of acoustic features from input audio, including Mel-scaled spectrogram, Mel frequency cepstral coefficients and chromagram that can effectively represent both contextual progression and emotional intensity of the speech. Output facial movements are characterized by 3D rotation and blending expression weights of a blendshape model, which can be used directly for animation. Thus, even though our model does not explicitly predict the affective states of the target speaker, her emotional manifestation is recreated via expression weights of the face model. Experiments on an evaluation dataset of different speakers across a wide range of affective states demonstrate promising results of our approach in real-time speech-driven facial animation.

CVPRW Semantic Scholar

Cite

Text

Pham et al. "Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.287

Markdown

[Pham et al. "Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/pham2017cvprw-speechdriven/) doi:10.1109/CVPRW.2017.287

BibTeX

@inproceedings{pham2017cvprw-speechdriven,
  title     = {{Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach}},
  author    = {Pham, Hai Xuan and Cheung, Samuel and Pavlovic, Vladimir},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2017},
  pages     = {2328-2336},
  doi       = {10.1109/CVPRW.2017.287},
  url       = {https://mlanthology.org/cvprw/2017/pham2017cvprw-speechdriven/}
}