Flow-Based Unconstrained Lip to Speech Generation
Abstract
Unconstrained lip-to-speech aims to generate corresponding speeches based on silent facial videos with no restriction to head pose or vocabulary. It is desirable to generate intelligible and natural speech with a fast speed in unconstrained settings. Currently, to handle the more complicated scenarios, most existing methods adopt the autoregressive architecture, which is optimized with the MSE loss. Although these methods have achieved promising performance, they are prone to bring issues including high inference latency and mel-spectrogram over-smoothness. To tackle these problems, we propose a novel flow-based non-autoregressive lip-to-speech model (GlowLTS) to break autoregressive constraints and achieve faster inference. Concretely, we adopt a flow-based decoder which is optimized by maximizing the likelihood of the training data and is capable of more natural and fast speech generation. Moreover, we devise a condition module to improve the intelligibility of generated speech. We demonstrate the superiority of our proposed method through objective and subjective evaluation on Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets. Our demo video can be found at https://glowlts.github.io/.
Cite
Text
He et al. "Flow-Based Unconstrained Lip to Speech Generation." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I1.19966Markdown
[He et al. "Flow-Based Unconstrained Lip to Speech Generation." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/he2022aaai-flow/) doi:10.1609/AAAI.V36I1.19966BibTeX
@inproceedings{he2022aaai-flow,
title = {{Flow-Based Unconstrained Lip to Speech Generation}},
author = {He, Jinzheng and Zhao, Zhou and Ren, Yi and Liu, Jinglin and Huai, Baoxing and Yuan, Nicholas Jing},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2022},
pages = {843-851},
doi = {10.1609/AAAI.V36I1.19966},
url = {https://mlanthology.org/aaai/2022/he2022aaai-flow/}
}