Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization
Abstract
Articulatory synthesis seeks to replicate the human voice by modeling the physics of the vocal apparatus, offering interpretable and controllable speech production. However, such methods often require careful hand-tuning to invert acoustic signals to their articulatory parameters. We present VocalTrax, a method which performs this inversion automatically via optimizing an accelerated vocal tract model implementation. Experiments on diverse vocal datasets show significant improvements over existing methods in out-of-domain speech reconstruction, while also revealing persistent challenges in matching natural voice quality.
Cite
Text
Mo et al. "Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization." NeurIPS 2024 Workshops: Audio_Imagination, 2024.Markdown
[Mo et al. "Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization." NeurIPS 2024 Workshops: Audio_Imagination, 2024.](https://mlanthology.org/neuripsw/2024/mo2024neuripsw-articulatory/)BibTeX
@inproceedings{mo2024neuripsw-articulatory,
title = {{Articulatory Synthesis of Speech and Diverse Vocal Sounds via Optimization}},
author = {Mo, Luke and Cherep, Manuel and Singh, Nikhil and Langford, Quinn and Maes, Patricia},
booktitle = {NeurIPS 2024 Workshops: Audio_Imagination},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/mo2024neuripsw-articulatory/}
}