Audio Super-Resolution Using Neural Networks

Abstract

We propose a neural network-based technique for enhancing the quality of audio signals such as speech or music by transforming inputs encoded at low sampling rates into higher-quality signals with an increased resolution in the time domain. This amounts to generating the missing samples within the low-resolution signal in a process akin to image super-resolution. On standard speech and music datasets, this approach outperforms baselines at 2x, 4x, and 6x upscaling ratios. The method has practical applications in telephony, compression, and text-to-speech generation; it can also be used to improve the scalability of recently-proposed generative models of audio.

Cite

Text

Kuleshov et al. "Audio Super-Resolution Using Neural Networks." International Conference on Learning Representations, 2017.

Markdown

[Kuleshov et al. "Audio Super-Resolution Using Neural Networks." International Conference on Learning Representations, 2017.](https://mlanthology.org/iclr/2017/kuleshov2017iclr-audio/)

BibTeX

@inproceedings{kuleshov2017iclr-audio,
  title     = {{Audio Super-Resolution Using Neural Networks}},
  author    = {Kuleshov, Volodymyr and Enam, S. Zayd and Ermon, Stefano},
  booktitle = {International Conference on Learning Representations},
  year      = {2017},
  url       = {https://mlanthology.org/iclr/2017/kuleshov2017iclr-audio/}
}