Unsupervised Speech Separation Using Mixtures of Mixtures

Abstract

Supervised approaches to single-channel speech separation rely on synthetic mixtures, so that the individual sources can be used as targets. Good performance depends upon how well the synthetic mixture data match real mixtures. However, matching synthetic data to the acoustic properties and distribution of sounds in a target domain can be challenging. Instead, we propose an unsupervised method that requires only single-channel acoustic mixtures, without ground-truth source signals. In this method, existing mixtures are mixed together to form a mixture of mixtures, which the model separates into latent sources. We propose a novel loss that allows the latent sources to be remixed to approximate the original mixtures. Experiments show that this method can achieve competitive performance on speech separation compared to supervised methods. In a semi-supervised learning setting, our method enables domain adaptation by incorporating unsupervised mixtures from a matched domain. In particular, we demonstrate that significant improvement to reverberant speech separation performance can be achieved by incorporating reverberant mixtures.

Cite

Text

Wisdom et al. "Unsupervised Speech Separation Using Mixtures of Mixtures." ICML 2020 Workshops: SAS, 2020.

Markdown

[Wisdom et al. "Unsupervised Speech Separation Using Mixtures of Mixtures." ICML 2020 Workshops: SAS, 2020.](https://mlanthology.org/icmlw/2020/wisdom2020icmlw-unsupervised/)

BibTeX

@inproceedings{wisdom2020icmlw-unsupervised,
  title     = {{Unsupervised Speech Separation Using Mixtures of Mixtures}},
  author    = {Wisdom, Scott and Tzinis, Efthymios and Erdogan, Hakan and Weiss, Ron J and Wilson, Kevin and Hershey, John R.},
  booktitle = {ICML 2020 Workshops: SAS},
  year      = {2020},
  url       = {https://mlanthology.org/icmlw/2020/wisdom2020icmlw-unsupervised/}
}