Audio Source Separation with Discriminative Scattering Networks
Abstract
Many monaural signal decomposition techniques proposed in the literature operate on a feature space consisting of a time-frequency representation of the input data. A challenge faced by these approaches is to effectively exploit the temporal dependencies of the signals at scales larger than the duration of a time-frame. In this work we propose to tackle this problem by modeling the signals using a time-frequency representation with multiple temporal resolutions. For this reason we use a signal representation that consists of a pyramid of wavelet scattering operators, which generalizes Constant Q Transforms CQT with extra layers of convolution and complex modulus. We first show that learning standard models with this multi-resolution setting improves source separation results over fixed-resolution methods. As study case, we use Non-Negative Matrix Factorizations NMF that has been widely considered in many audio application. Then, we investigate the inclusion of the proposed multi-resolution setting into a discriminative training regime. We discuss several alternatives using different deep neural network architectures, and our preliminary experiments suggest that in this task, finite impulse, multi-resolution Convolutional Networks are a competitive baseline compared to recurrent alternatives.
Cite
Text
Sprechmann et al. "Audio Source Separation with Discriminative Scattering Networks." International Conference on Learning Representations, 2015. doi:10.1007/978-3-319-22482-4_30Markdown
[Sprechmann et al. "Audio Source Separation with Discriminative Scattering Networks." International Conference on Learning Representations, 2015.](https://mlanthology.org/iclr/2015/sprechmann2015iclr-audio/) doi:10.1007/978-3-319-22482-4_30BibTeX
@inproceedings{sprechmann2015iclr-audio,
title = {{Audio Source Separation with Discriminative Scattering Networks}},
author = {Sprechmann, Pablo and Bruna, Joan and LeCun, Yann},
booktitle = {International Conference on Learning Representations},
year = {2015},
doi = {10.1007/978-3-319-22482-4_30},
url = {https://mlanthology.org/iclr/2015/sprechmann2015iclr-audio/}
}