Diffusion-Based Speech Enhancement: Demonstration of Performance and Generalization
Abstract
This demo presents advanced techniques in speech enhancement using deep generative models. It highlights the generalization capabilities of score-based generative models for speech enhancement and compares directly with Schrödinger bridge approaches. The presented methods focus on generating high-quality super-wideband speech at a sampling rate of 48 kHz. Participants will record speech using a single microphone in a noisy environment, such as a conference venue. These recordings will then be enhanced and played back through headphones, demonstrating the model's effectiveness in improving speech quality and intelligibility.
Cite
Text
Richter and Gerkmann. "Diffusion-Based Speech Enhancement: Demonstration of Performance and Generalization." NeurIPS 2024 Workshops: Audio_Imagination, 2024.Markdown
[Richter and Gerkmann. "Diffusion-Based Speech Enhancement: Demonstration of Performance and Generalization." NeurIPS 2024 Workshops: Audio_Imagination, 2024.](https://mlanthology.org/neuripsw/2024/richter2024neuripsw-diffusionbased/)BibTeX
@inproceedings{richter2024neuripsw-diffusionbased,
title = {{Diffusion-Based Speech Enhancement: Demonstration of Performance and Generalization}},
author = {Richter, Julius and Gerkmann, Timo},
booktitle = {NeurIPS 2024 Workshops: Audio_Imagination},
year = {2024},
url = {https://mlanthology.org/neuripsw/2024/richter2024neuripsw-diffusionbased/}
}