Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario
Abstract
Often a face has a voice. Appearance sometimes has a strong relationship with one's voice. In this work, we study how a face can be converted to a voice, which is a face-based voice conversion. Since there is no clean dataset that contains face and speech, voice conversion faces difficult learning and low-quality problems caused by background noise or echo. Too much redundant information for face-to-voice also causes synthesis of a general style of speech. Furthermore, previous work tried to disentangle speech with bottleneck adjustment. However, it is hard to decide on the size of the bottleneck. Therefore, we propose a bottleneck-free strategy for speech disentanglement. To avoid synthesizing the general style of speech, we utilize framewise facial embedding. It applied adversarial learning with a multi-scale discriminator for the model to achieve better quality. In addition, the self-attention module is added to focus on content-related features for in-the-wild data. Quantitative experiments show that our method outperforms previous work.
Cite
Text
Weng et al. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." AAAI Conference on Artificial Intelligence, 2023. doi:10.1609/AAAI.V37I11.26607Markdown
[Weng et al. "Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario." AAAI Conference on Artificial Intelligence, 2023.](https://mlanthology.org/aaai/2023/weng2023aaai-zero/) doi:10.1609/AAAI.V37I11.26607BibTeX
@inproceedings{weng2023aaai-zero,
title = {{Zero-Shot Face-Based Voice Conversion: Bottleneck-Free Speech Disentanglement in the Real-World Scenario}},
author = {Weng, Shao-En and Shuai, Hong-Han and Cheng, Wen-Huang},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2023},
pages = {13718-13726},
doi = {10.1609/AAAI.V37I11.26607},
url = {https://mlanthology.org/aaai/2023/weng2023aaai-zero/}
}