Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis

Abstract

Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations, resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation is an appealing alternative, aligning more accurately with human auditory perception, and benefitting from well-established fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting Vocos, a new model that directly generates Fourier spectral coefficients. Vocos not only matches the state-of-the-art in audio quality, as demonstrated in our evaluations, but it also substantially improves computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural vocoding approaches. The source code and model weights have been open-sourced.

Cite

Text

Siuzdak. "Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis." International Conference on Learning Representations, 2024.

Markdown

[Siuzdak. "Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis." International Conference on Learning Representations, 2024.](https://mlanthology.org/iclr/2024/siuzdak2024iclr-vocos/)

BibTeX

@inproceedings{siuzdak2024iclr-vocos,
  title     = {{Vocos: Closing the Gap Between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis}},
  author    = {Siuzdak, Hubert},
  booktitle = {International Conference on Learning Representations},
  year      = {2024},
  url       = {https://mlanthology.org/iclr/2024/siuzdak2024iclr-vocos/}
}