Reference-Based Speech Enhancement via Feature Alignment and Fusion Network

Abstract

Speech enhancement aims at recovering a clean speech from a noisy input, which can be classified into single speech enhancement and personalized speech enhancement. Personalized speech enhancement usually utilizes the speaker identity extracted from the noisy speech itself (or a clean reference speech) as a global embedding to guide the enhancement process. Different from them, we observe that the speeches of the same speaker are correlated in terms of frame-level short-time Fourier Transform (STFT) spectrogram. Therefore, we propose reference-based speech enhancement via a feature alignment and fusion network (FAF-Net). Given a noisy speech and a clean reference speech spoken by the same speaker, we first propose a feature level alignment strategy to warp the clean reference with the noisy speech in frame level. Then, we fuse the reference feature with the noisy feature via a similarity-based fusion strategy. Finally, the fused features are skipped connected to the decoder, which generates the enhanced results. Experimental results demonstrate that the performance of the proposed FAF-Net is close to state-of-the-art speech enhancement methods on both DNS and Voice Bank+DEMAND datasets. Our code is available at https://github.com/HieDean/FAF-Net.

Cite

Text

Yue et al. "Reference-Based Speech Enhancement via Feature Alignment and Fusion Network." AAAI Conference on Artificial Intelligence, 2022. doi:10.1609/AAAI.V36I10.21419

Markdown

[Yue et al. "Reference-Based Speech Enhancement via Feature Alignment and Fusion Network." AAAI Conference on Artificial Intelligence, 2022.](https://mlanthology.org/aaai/2022/yue2022aaai-reference/) doi:10.1609/AAAI.V36I10.21419

BibTeX

@inproceedings{yue2022aaai-reference,
  title     = {{Reference-Based Speech Enhancement via Feature Alignment and Fusion Network}},
  author    = {Yue, Huanjing and Duo, Wenxin and Peng, Xiulian and Yang, Jingyu},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {11648-11656},
  doi       = {10.1609/AAAI.V36I10.21419},
  url       = {https://mlanthology.org/aaai/2022/yue2022aaai-reference/}
}