Word-Level Speech Recognition with a Letter to Word Encoder

Abstract

We propose a direct-to-word sequence model which uses a word network to learn word embeddings from letters. The word network can be integrated seamlessly with arbitrary sequence models including Connectionist Temporal Classification and encoder-decoder models with attention. We show our direct-to-word model can achieve word error rate gains over sub-word level models for speech recognition. We also show that our direct-to-word approach retains the ability to predict words not seen at training time without any retraining. Finally, we demonstrate that a word-level model can use a larger stride than a sub-word level model while maintaining accuracy. This makes the model more efficient both for training and inference.

Cite

Text

Collobert et al. "Word-Level Speech Recognition with a Letter to Word Encoder." International Conference on Machine Learning, 2020.

Markdown

[Collobert et al. "Word-Level Speech Recognition with a Letter to Word Encoder." International Conference on Machine Learning, 2020.](https://mlanthology.org/icml/2020/collobert2020icml-wordlevel/)

BibTeX

@inproceedings{collobert2020icml-wordlevel,
  title     = {{Word-Level Speech Recognition with a Letter to Word Encoder}},
  author    = {Collobert, Ronan and Hannun, Awni and Synnaeve, Gabriel},
  booktitle = {International Conference on Machine Learning},
  year      = {2020},
  pages     = {2100-2110},
  volume    = {119},
  url       = {https://mlanthology.org/icml/2020/collobert2020icml-wordlevel/}
}