Deep Audio Priors Emerge from Harmonic Convolutional Networks

Abstract

Convolutional neural networks (CNNs) excel in image recognition and generation. Among many efforts to explain their effectiveness, experiments show that CNNs carry strong inductive biases that capture natural image priors. Do deep networks also have inductive biases for audio signals? In this paper, we empirically show that current network architectures for audio processing do not show strong evidence in capturing such priors. We propose Harmonic Convolution, an operation that helps deep networks distill priors in audio signals by explicitly utilizing the harmonic structure within. This is done by engineering the kernel to be supported by sets of harmonic series, instead of local neighborhoods for convolutional kernels. We show that networks using Harmonic Convolution can reliably model audio priors and achieve high performance in unsupervised audio restoration tasks. With Harmonic Convolution, they also achieve better generalization performance for sound source separation.

Cite

Text

Zhang et al. "Deep Audio Priors Emerge from Harmonic Convolutional Networks." International Conference on Learning Representations, 2020.

Markdown

[Zhang et al. "Deep Audio Priors Emerge from Harmonic Convolutional Networks." International Conference on Learning Representations, 2020.](https://mlanthology.org/iclr/2020/zhang2020iclr-deep/)

BibTeX

@inproceedings{zhang2020iclr-deep,
  title     = {{Deep Audio Priors Emerge from Harmonic Convolutional Networks}},
  author    = {Zhang, Zhoutong and Wang, Yunyun and Gan, Chuang and Wu, Jiajun and Tenenbaum, Joshua B. and Torralba, Antonio and Freeman, William T.},
  booktitle = {International Conference on Learning Representations},
  year      = {2020},
  url       = {https://mlanthology.org/iclr/2020/zhang2020iclr-deep/}
}