Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition

Abstract

Connectionist Rpeech recognition systems are often handicapped by an inconsistency between training and testing criteria. This prob(cid:173) lem is addressed by the Multi-State Time Delay Neural Network (MS-TDNN), a hierarchical phonf'mp and word classifier which uses DTW to modulate its connectivit.y pattern, and which is directly trained on word-level targets. The consistent use of word accu(cid:173) racy as a criterion during bot.h t.raining and testing leads to very high system performance, even wif II limited training dat.a. Until now, the MS-TDN N has been appli('d primarily to small vocabu(cid:173) lary recognition and word spotting tasks. In this papf'f we apply the architecture to large vocabulary continuous speech recognition, and demonstrate that our MS-TDNN outperforms all ot,hf'r sys(cid:173) tems that have been tested on tht' eMU Conference Registration database.

Cite

Text

Tebelskis and Waibel. "Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition." Neural Information Processing Systems, 1992.

Markdown

[Tebelskis and Waibel. "Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition." Neural Information Processing Systems, 1992.](https://mlanthology.org/neurips/1992/tebelskis1992neurips-performance/)

BibTeX

@inproceedings{tebelskis1992neurips-performance,
  title     = {{Performance Through Consistency: MS-TDNN's for Large Vocabulary Continuous Speech Recognition}},
  author    = {Tebelskis, Joe and Waibel, Alex},
  booktitle = {Neural Information Processing Systems},
  year      = {1992},
  pages     = {696-703},
  url       = {https://mlanthology.org/neurips/1992/tebelskis1992neurips-performance/}
}