Conversational Speech Transcription Using Context-Dependent Deep Neural Networks

Yu, Dong; Seide, Frank; Li, Gang

doi:10.21437/Interspeech.2011-169

Conversational Speech Transcription Using Context-Dependent Deep Neural Networks

Dong Yu, Frank Seide, Gang Li

ICML 2012

doi:10.21437/Interspeech.2011-169 /icml/2012/yu2012icml-conversational/

Abstract

We apply the recently proposed Context-Dependent Deep-Neural-Network HMMs, or CD-DNN-HMMs, to speech-to-text transcription. For single-pass speaker-independent recognition on the RT03S Fisher portion of phone-call transcription benchmark (Switchboard), the word-error rate is reduced from 27.4%, obtained by discriminatively trained Gaussian-mixture HMMs, to 18.5%—a 33 % relative improvement. CD-DNN-HMMs combine classic artificial-neural-network HMMs with traditional tied-state triphones and deep-beliefnetwork pre-training. They had previously been shown to reduce errors by 16 % relatively when trained on tens of hours of data using hundreds of tied states. This paper takes CD-DNN-HMMs further and applies them to transcription using over 300 hours of training data, over 9000 tied states, and up to 9 hidden layers, and demonstrates how sparseness can be exploited. On four less well-matched transcription tasks, we observe relative error reductions of 22–28%. Index Terms: speech recognition, deep belief networks, deep neural networks

PDF Semantic Scholar

Cite

Text

Yu et al. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." International Conference on Machine Learning, 2012. doi:10.21437/Interspeech.2011-169

Markdown

[Yu et al. "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks." International Conference on Machine Learning, 2012.](https://mlanthology.org/icml/2012/yu2012icml-conversational/) doi:10.21437/Interspeech.2011-169

BibTeX

@inproceedings{yu2012icml-conversational,
  title     = {{Conversational Speech Transcription Using Context-Dependent Deep Neural Networks}},
  author    = {Yu, Dong and Seide, Frank and Li, Gang},
  booktitle = {International Conference on Machine Learning},
  year      = {2012},
  doi       = {10.21437/Interspeech.2011-169},
  url       = {https://mlanthology.org/icml/2012/yu2012icml-conversational/}
}