Flow-Based Unconstrained Lip to Speech Generation

Jinzheng He, Zhou Zhao, Yi Ren, Jinglin Liu, Baoxing Huai, Nicholas Jing Yuan

AAAI 2022 pp. 843-851

doi:10.1609/AAAI.V36I1.19966 /aaai/2022/he2022aaai-flow/

Abstract

Unconstrained lip-to-speech aims to generate corresponding speeches based on silent facial videos with no restriction to head pose or vocabulary. It is desirable to generate intelligible and natural speech with a fast speed in unconstrained settings. Currently, to handle the more complicated scenarios, most existing methods adopt the autoregressive architecture, which is optimized with the MSE loss. Although these methods have achieved promising performance, they are prone to bring issues including high inference latency and mel-spectrogram over-smoothness. To tackle these problems, we propose a novel flow-based non-autoregressive lip-to-speech model (GlowLTS) to break autoregressive constraints and achieve faster inference. Concretely, we adopt a flow-based decoder which is optimized by maximizing the likelihood of the training data and is capable of more natural and fast speech generation. Moreover, we devise a condition module to improve the intelligibility of generated speech. We demonstrate the superiority of our proposed method through objective and subjective evaluation on Lip2Wav-Chemistry-Lectures and Lip2Wav-Chess-Analysis datasets. Our demo video can be found at https://glowlts.github.io/.

PDF AAAI Semantic Scholar

Cite

Text

He et al. "Flow-Based Unconstrained Lip to Speech Generation." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I1.19966

Markdown

[He et al. "Flow-Based Unconstrained Lip to Speech Generation." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/he2022aaai-flow/) doi:10.1609/AAAI.V36I1.19966

BibTeX

@inproceedings{he2022aaai-flow,
  title     = {{Flow-Based Unconstrained Lip to Speech Generation}},
  author    = {He, Jinzheng and Zhao, Zhou and Ren, Yi and Liu, Jinglin and Huai, Baoxing and Yuan, Nicholas Jing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {843-851},
  doi       = {10.1609/AAAI.V36I1.19966},
  url       = {https://mlanthology.org/aaai/2022/he2022aaai-flow/}
}