BridgeVoC: Insights into Using Schrödinger Bridge for Neural Vocoders
Abstract
While previous diffusion-based neural vocoders typically follow a noise-to-data generation pipe-line, the linear-degradation prior of the mel-spectrogram is often neglected, resulting in limited generation quality. By revisiting the vocoder task and excavating its connection with the signal restoration task, this paper proposes a novel time-frequency (T-F) domain-based neural vocoder with the Schrödinger Bridge, called \textbf{BridgeVoC}, which is the first to follow the data-to-data generation paradigm. Specifically, the mel-spectrogram can be projected into the target linear-scale domain and regarded as a degraded spectral representation with a deficient rank distribution. Based on this, the Schrödinger Bridge is leveraged to establish a connection between the degraded and target data distributions. During the inference stage, starting from the degraded representation, the target spectrum can be gradually restored rather than generated from a Gaussian noise process. We conduct extensive experiments on the LJSpeech and LibriTTS benchmarks. Quantitative and qualitative results demonstrate that the proposed method enjoys faster inference speed and outperforms existing diffusion-based vocoder baselines, while also achieving competitive or better performance compared to other non-diffusion state-of-the-art methods across multiple evaluation metrics.
Cite
Text
Lei et al. "BridgeVoC: Insights into Using Schrödinger Bridge for Neural Vocoders." ICLR 2025 Workshops: DeLTa, 2025.Markdown
[Lei et al. "BridgeVoC: Insights into Using Schrödinger Bridge for Neural Vocoders." ICLR 2025 Workshops: DeLTa, 2025.](https://mlanthology.org/iclrw/2025/lei2025iclrw-bridgevoc/)BibTeX
@inproceedings{lei2025iclrw-bridgevoc,
title = {{BridgeVoC: Insights into Using Schrödinger Bridge for Neural Vocoders}},
author = {Lei, Tong and Li, Andong and Chen, Rilin and Yu, Dong and Yu, Meng and Lu, Jing and Zheng, Chengshi},
booktitle = {ICLR 2025 Workshops: DeLTa},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/lei2025iclrw-bridgevoc/}
}