Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis

Abstract

Denoising diffusion probabilistic models (DDPMs) have gained popularity in devising neural vocoders and obtained outstanding performance. However, existing DDPM-based neural vocoders struggle to handle the prosody diversities due to their susceptibility to mode-collapse issues confronted with imbalanced data. We introduced Cauchy Diffusion, a model incorporating the Cauchy noises to address this challenge. The heavy-tailed Cauchy distribution exhibits better resilience to imbalanced speech data, potentially improving prosody modeling. Our experiments on the LJSpeech and VCTK datasets demonstrate that Cauchy Diffusion achieved state-of-the-art speech synthesis performance. Compared to existing neural vocoders, our Cauchy Diffusion notably improved speech diversity while maintaining superior speech quality. Remarkably, Cauchy Diffusion surpassed neural vocoders based on generative adversarial networks (GANs) that are explicitly optimized to improve diversity.

Cite

Text

Lian et al. "Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I23.34634

Markdown

[Lian et al. "Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/lian2025aaai-cauchy/) doi:10.1609/AAAI.V39I23.34634

BibTeX

@inproceedings{lian2025aaai-cauchy,
  title     = {{Cauchy Diffusion: A Heavy-Tailed Denoising Diffusion Probabilistic Model for Speech Synthesis}},
  author    = {Lian, Qi and Qi, Yu and Wang, Yueming},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {24549-24557},
  doi       = {10.1609/AAAI.V39I23.34634},
  url       = {https://mlanthology.org/aaai/2025/lian2025aaai-cauchy/}
}