Speech Recognition with Dynamic Bayesian Networks

Zweig, Geoffrey; Russell, Stuart

Speech Recognition with Dynamic Bayesian Networks

AAAI 1998 pp. 173-180

/aaai/1998/zweig1998aaai-speech/

Abstract

Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. Th...

PDF AAAI Semantic Scholar

Cite

Text

Zweig and Russell. "Speech Recognition with Dynamic Bayesian Networks." AAAI Conference on Artificial Intelligence, 1998.

Markdown

[Zweig and Russell. "Speech Recognition with Dynamic Bayesian Networks." AAAI Conference on Artificial Intelligence, 1998.](https://mlanthology.org/aaai/1998/zweig1998aaai-speech/)

BibTeX

@inproceedings{zweig1998aaai-speech,
  title     = {{Speech Recognition with Dynamic Bayesian Networks}},
  author    = {Zweig, Geoffrey and Russell, Stuart},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {1998},
  pages     = {173-180},
  url       = {https://mlanthology.org/aaai/1998/zweig1998aaai-speech/}
}