Efficient Neural Audio Synthesis
Abstract
Sequential models achieve state-of-the-art results in audio, visual and textual domains with respect to both estimating the data distribution and generating desired samples. Efficient sampling for this class of models at the cost of little to no loss in quality has however remained an elusive problem. With a focus on text-to-speech synthesis, we describe a set of general techniques for reducing sampling time while maintaining high output quality. We first describe a single-layer recurrent neural network, the WaveRNN, with a dual softmax layer that matches the quality of the state-of-the-art WaveNet model. The compact form of the network makes it possible to generate 24 kHz 16-bit audio 4 times faster than real time on a GPU. Secondly, we apply a weight pruning technique to reduce the number of weights in the WaveRNN. We find that, for a constant number of parameters, large sparse networks perform better than small dense networks and this relationship holds past sparsity levels of more than 96%. The small number of weights in a Sparse WaveRNN makes it possible to sample high-fidelity audio on a mobile phone CPU in real time. Finally, we describe a new dependency scheme for sampling that lets us trade a constant number of non-local, distant dependencies for the ability to generate samples in batches. The Batch WaveRNN produces 8 samples per step without loss of quality and offers orthogonal ways of further increasing sampling efficiency.
Cite
Text
Kalchbrenner et al. "Efficient Neural Audio Synthesis." International Conference on Machine Learning, 2018.Markdown
[Kalchbrenner et al. "Efficient Neural Audio Synthesis." International Conference on Machine Learning, 2018.](https://mlanthology.org/icml/2018/kalchbrenner2018icml-efficient/)BibTeX
@inproceedings{kalchbrenner2018icml-efficient,
title = {{Efficient Neural Audio Synthesis}},
author = {Kalchbrenner, Nal and Elsen, Erich and Simonyan, Karen and Noury, Seb and Casagrande, Norman and Lockhart, Edward and Stimberg, Florian and Oord, Aaron and Dieleman, Sander and Kavukcuoglu, Koray},
booktitle = {International Conference on Machine Learning},
year = {2018},
pages = {2410-2419},
volume = {80},
url = {https://mlanthology.org/icml/2018/kalchbrenner2018icml-efficient/}
}