Speech Recognition with Dynamic Bayesian Networks
Abstract
Dynamic Bayesian networks (DBNs) are a useful tool for representing complex stochastic processes. Recent developments in inference and learning in DBNs allow their use in real-world applications. In this paper, we apply DBNs to the problem of speech recognition. The factored state representation enabled by DBNs allows us to explicitly represent long-term articulatory and acoustic context in addition to the phonetic-state information maintained by hidden Markov models (HMMs). Furthermore, it enables us to model the short-term correlations among multiple observation streams within single time-frames. Given a DBN structure capable of representing these long- and short-term correlations, we applied the EM algorithm to learn models with up to 500,000 parameters. The use of structured DBN models decreased the error rate by 12 to 29% on a large-vocabulary isolated-word recognition task, compared to a discrete HMM; it also improved significantly on other published results for the same task. Th...
Cite
Text
Zweig and Russell. "Speech Recognition with Dynamic Bayesian Networks." AAAI Conference on Artificial Intelligence, 1998.Markdown
[Zweig and Russell. "Speech Recognition with Dynamic Bayesian Networks." AAAI Conference on Artificial Intelligence, 1998.](https://mlanthology.org/aaai/1998/zweig1998aaai-speech/)BibTeX
@inproceedings{zweig1998aaai-speech,
title = {{Speech Recognition with Dynamic Bayesian Networks}},
author = {Zweig, Geoffrey and Russell, Stuart},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {1998},
pages = {173-180},
url = {https://mlanthology.org/aaai/1998/zweig1998aaai-speech/}
}