Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach
Abstract
We introduce a long short-term memory recurrent neural network (LSTM-RNN) approach for real-time facial animation, which automatically estimates head rotation and facial action unit activations of a speaker from just her speech. Specifically, the time-varying contextual non-linear mapping between audio stream and visual facial movements is realized by training a LSTM neural network on a large audio-visual data corpus. In this work, we extract a set of acoustic features from input audio, including Mel-scaled spectrogram, Mel frequency cepstral coefficients and chromagram that can effectively represent both contextual progression and emotional intensity of the speech. Output facial movements are characterized by 3D rotation and blending expression weights of a blendshape model, which can be used directly for animation. Thus, even though our model does not explicitly predict the affective states of the target speaker, her emotional manifestation is recreated via expression weights of the face model. Experiments on an evaluation dataset of different speakers across a wide range of affective states demonstrate promising results of our approach in real-time speech-driven facial animation.
Cite
Text
Pham et al. "Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.287Markdown
[Pham et al. "Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/pham2017cvprw-speechdriven/) doi:10.1109/CVPRW.2017.287BibTeX
@inproceedings{pham2017cvprw-speechdriven,
title = {{Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach}},
author = {Pham, Hai Xuan and Cheung, Samuel and Pavlovic, Vladimir},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2017},
pages = {2328-2336},
doi = {10.1109/CVPRW.2017.287},
url = {https://mlanthology.org/cvprw/2017/pham2017cvprw-speechdriven/}
}