SSDM: Scalable Speech Dysfluency Modeling
Abstract
Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions~~\cite{lian2023unconstrained-udm, lian-anumanchipalli-2024-towards-hudm} suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://berkeley-speech-group.github.io/SSDM/}.
Cite
Text
Lian et al. "SSDM: Scalable Speech Dysfluency Modeling." Neural Information Processing Systems, 2024. doi:10.52202/079017-3230Markdown
[Lian et al. "SSDM: Scalable Speech Dysfluency Modeling." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lian2024neurips-ssdm/) doi:10.52202/079017-3230BibTeX
@inproceedings{lian2024neurips-ssdm,
title = {{SSDM: Scalable Speech Dysfluency Modeling}},
author = {Lian, Jiachen and Zhou, Xuanru and Ezzes, Zoe and Vonk, Jet and Morin, Brittany and Baquirin, David and Miller, Zachary and Tempini, Maria Luisa Gorno and Anumanchipalli, Gopala},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-3230},
url = {https://mlanthology.org/neurips/2024/lian2024neurips-ssdm/}
}