Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario

Abstract

Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottleneck. Therefore, we propose a bottleneck-free strategy for speech disentanglement. To avoid synthesizing the general style of speech, we utilize framewise facial embedding. It applied adversarial learning with a multi-scale discriminator for the model to achieve better quality. In addition, the self-attention module is added to focus on content-related features for in-the-wild data. Quantitative experiments show that our method outperforms previous work.

Cite

Text

Weng et al. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I11.26607

Markdown

[Weng et al. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/weng2023aaai-zero/) doi:10.1609/AAAI.V37I11.26607

BibTeX

@inproceedings{weng2023aaai-zero,
  title     = {{Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario}},
  author    = {Weng, Shao-En and Shuai, Hong-Han and Cheng, Wen-Huang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2023},
  pages     = {13718-13726},
  doi       = {10.1609/AAAI.V37I11.26607},
  url       = {https://mlanthology.org/aaai/2023/weng2023aaai-zero/}
}