Robust Front-End for Multi-Channel ASR Using Flow-Based Density Estimation

Abstract

For multi-channel speech recognition, speech enhancement techniques such as denoising or dereverberation are conventionally applied as a front-end processor. Deep learning-based front-ends using such techniques require aligned clean and noisy speech pairs which are generally obtained via data simulation. Recently, several joint optimization techniques have been proposed to train the front-end without parallel data within an end-to-end automatic speech recognition (ASR) scheme. However, the ASR objective is sub-optimal and insufficient for fully training the front-end, which still leaves room for improvement. In this paper, we propose a novel approach which incorporates flow-based density estimation for the robust front-end using non-parallel clean and noisy speech. Experimental results on the CHiME-4 dataset show that the proposed method outperforms the conventional techniques where the front-end is trained only with ASR objective.

Cite

Text

Kim et al. "Robust Front-End for Multi-Channel ASR Using Flow-Based Density Estimation." International Joint Conference on Artificial Intelligence, 2020. doi:10.24963/IJCAI.2020/518

Markdown

[Kim et al. "Robust Front-End for Multi-Channel ASR Using Flow-Based Density Estimation." International Joint Conference on Artificial Intelligence, 2020.](https://mlanthology.org/ijcai/2020/kim2020ijcai-robust/) doi:10.24963/IJCAI.2020/518

BibTeX

@inproceedings{kim2020ijcai-robust,
  title     = {{Robust Front-End for Multi-Channel ASR Using Flow-Based Density Estimation}},
  author    = {Kim, Hyeongju and Lee, Hyeonseung and Kang, Woo Hyun and Kim, Hyung Yong and Kim, Nam Soo},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {3744-3750},
  doi       = {10.24963/IJCAI.2020/518},
  url       = {https://mlanthology.org/ijcai/2020/kim2020ijcai-robust/}
}