Variational Speech Separation of More Sources than Mixtures

Abstract

We present a novel structured variational inference algorithm for probabilistic speech separation. The algorithm is built upon a new generative probability model of speech production and mixing in the full spectral domain, that utilizes a detailed probability model of speech trained in the magnitude spectral domain, and the position ensemble of the underlying sources as a natural, low-dimensional parameterization of the mixing process. The algorithm is able to produce high quality estimates of the underlying source configurations, even when there are more underlying sources than available microphone recordings. Spectral phase estimates of all underlying speakers are automatically recovered by the algorithm, facilitating the direct transformation of the obtained source estimates into the time domain, to yield speech signals of high perceptual quality. 1

Cite

Text

Rennie et al. "Variational Speech Separation of More Sources than Mixtures." Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005.

Markdown

[Rennie et al. "Variational Speech Separation of More Sources than Mixtures." Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005.](https://mlanthology.org/aistats/2005/rennie2005aistats-variational/)

BibTeX

@inproceedings{rennie2005aistats-variational,
  title     = {{Variational Speech Separation of More Sources than Mixtures}},
  author    = {Rennie, Steven J. and Achan, Kannan and Frey, Brendan J. and Aarabi, Parham},
  booktitle = {Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics},
  year      = {2005},
  pages     = {293-300},
  volume    = {R5},
  url       = {https://mlanthology.org/aistats/2005/rennie2005aistats-variational/}
}