Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Abstract

Recent advances in the design of neural network architectures, in particular those specialized in modeling sequences, have provided significant improvements in speech separation performance. In this work, we propose to use a bio-inspired architecture called Fully Recurrent Convolutional Neural Network (FRCNN) to solve the separation task. This model contains bottom-up, top-down and lateral connections to fuse information processed at various time-scales represented by stages. In contrast to the traditional approach updating stages in parallel, we propose to first update the stages one by one in the bottom-up direction, then fuse information from adjacent stages simultaneously and finally fuse information from all stages to the bottom stage together. Experiments showed that this asynchronous updating scheme achieved significantly better results with much fewer parameters than the traditional synchronous updating scheme on speech separation. In addition, the proposed model achieved competitive or better results with high efficiency as compared to other state-of-the-art approaches on two benchmark datasets.

Cite

Text

Hu et al. "Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network." Neural Information Processing Systems, 2021.

Markdown

[Hu et al. "Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network." Neural Information Processing Systems, 2021.](https://mlanthology.org/neurips/2021/hu2021neurips-speech/)

BibTeX

@inproceedings{hu2021neurips-speech,
  title     = {{Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network}},
  author    = {Hu, Xiaolin and Li, Kai and Zhang, Weiyi and Luo, Yi and Lemercier, Jean-Marie and Gerkmann, Timo},
  booktitle = {Neural Information Processing Systems},
  year      = {2021},
  url       = {https://mlanthology.org/neurips/2021/hu2021neurips-speech/}
}