DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation

Abstract

In this work, we present DFlow, a novel generative framework that combines Normalizing Flow (NF) with a Denoising AutoEncoder (DAE), for high-fidelity waveform generation. With a tactfully designed structure, DFlow seamlessly integrates the capabilities of both NF and DAE, resulting in a significantly improved performance compared to the standard NF models. Experimental results showcase DFlow’s superiority, achieving the highest MOS score among the existing methods on commonly used datasets and the fastest synthesis speed among all likelihood models. We further demonstrate the generalization ability of DFlow by generating high-quality out-of-distribution audio samples, such as singing and music audio. Additionally, we extend the model capacity of DFlow by scaling up both the model size and training set size. Our large-scale universal vocoder, DFlow-XL, achieves highly competitive performance against the best universal vocoder, BigVGAN.

Cite

Text

Miao et al. "DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation." International Conference on Machine Learning, 2024.

Markdown

[Miao et al. "DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/miao2024icml-dflow/)

BibTeX

@inproceedings{miao2024icml-dflow,
  title     = {{DFlow: A Generative Model Combining Denoising AutoEncoder and Normalizing Flow for High Fidelity Waveform Generation}},
  author    = {Miao, Chenfeng and Zhu, Qingying and Chen, Minchuan and Hu, Wei and Li, Zijian and Wang, Shaojun and Xiao, Jing},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {35590-35606},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/miao2024icml-dflow/}
}