A Streaming End-to-End Framework for Spoken Language Understanding
Abstract
End-to-end spoken language understanding (SLU) recently attracted increasing interest. Compared to the conventional tandem-based approach that combines speech recognition and language understanding as separate modules, the new approach extracts users' intentions directly from the speech signals, resulting in joint optimization and low latency. Such an approach, however, is typically designed to process one intent at a time, which leads users to have to take multiple rounds to fulfill their requirements while interacting with a dialogue system. In this paper, we propose a streaming end-to-end framework that can process multiple intentions in an online and incremental way. The backbone of our framework is a unidirectional RNN trained with the connectionist temporal classification (CTC) criterion. By this design, an intention can be identified when sufficient evidence has been accumulated, and multiple intentions will be identified sequentially. We evaluate our solution on the Fluent Speech Commands (FSC) dataset and the detection accuracy is about 97 % on all multi-intent settings. This result is comparable to the performance of the state-of-the-art non-streaming models, but is achieved in an online and incremental way. We also employ our model to an keyword spotting task using the Google Speech Commands dataset, and the results are also highly promising.
Cite
Text
Potdar et al. "A Streaming End-to-End Framework for Spoken Language Understanding." International Joint Conference on Artificial Intelligence, 2021. doi:10.24963/IJCAI.2021/538Markdown
[Potdar et al. "A Streaming End-to-End Framework for Spoken Language Understanding." International Joint Conference on Artificial Intelligence, 2021.](https://mlanthology.org/ijcai/2021/potdar2021ijcai-streaming/) doi:10.24963/IJCAI.2021/538BibTeX
@inproceedings{potdar2021ijcai-streaming,
title = {{A Streaming End-to-End Framework for Spoken Language Understanding}},
author = {Potdar, Nihal and Avila, Anderson Raymundo and Xing, Chao and Wang, Dong and Cao, Yiran and Chen, Xiao},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2021},
pages = {3906-3914},
doi = {10.24963/IJCAI.2021/538},
url = {https://mlanthology.org/ijcai/2021/potdar2021ijcai-streaming/}
}