Speech-Driven Facial Animation Using Manifold Relevance Determination
Abstract
In this paper, a new approach to visual speech synthesis using a joint probabilistic model is introduced, namely the Gaussian process latent variable model trimmed with manifold relevance determination model, which explicitly models coarticulation. One talking head dataset is processed (LIPS dataset) by extracting visual and audio features from the sequences. The model can capture the structure of data with extremely high dimensionality. Distinguishable visual features can be inferred directly from the trained model by sampling from the discovered latent points. Statistical evaluation of inferred visual features against ground truth data is obtained and compared with the current state-of-the-art visual speech synthesis approach. The quantitative results demonstrate that the proposed approach outperforms the state-of-the-art technique.
Cite
Text
Dawood et al. "Speech-Driven Facial Animation Using Manifold Relevance Determination." European Conference on Computer Vision, 2016. doi:10.1007/978-3-319-48881-3_57Markdown
[Dawood et al. "Speech-Driven Facial Animation Using Manifold Relevance Determination." European Conference on Computer Vision, 2016.](https://mlanthology.org/eccv/2016/dawood2016eccv-speech/) doi:10.1007/978-3-319-48881-3_57BibTeX
@inproceedings{dawood2016eccv-speech,
title = {{Speech-Driven Facial Animation Using Manifold Relevance Determination}},
author = {Dawood, Samia and Hicks, Yulia and Marshall, A. David},
booktitle = {European Conference on Computer Vision},
year = {2016},
pages = {869-882},
doi = {10.1007/978-3-319-48881-3_57},
url = {https://mlanthology.org/eccv/2016/dawood2016eccv-speech/}
}