Word-Level Speech Recognition with a Letter to Word Encoder
Abstract
We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.
Cite
Text
Collobert et al. "Word-Level Speech Recognition with a Letter to Word Encoder." International Conference on Machine Learning, 2020.Markdown
[Collobert et al. "Word-Level Speech Recognition with a Letter to Word Encoder." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/collobert2020icml-wordlevel/)BibTeX
@inproceedings{collobert2020icml-wordlevel,
title = {{Word-Level Speech Recognition with a Letter to Word Encoder}},
author = {Collobert, Ronan and Hannun, Awni and Synnaeve, Gabriel},
booktitle = {International Conference on Machine Learning},
year = {2020},
pages = {2100-2110},
volume = {119},
url = {https://mlanthology.org/icml/2020/collobert2020icml-wordlevel/}
}