Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Abstract

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonal-covariance Gaussians: GRBMs treat different components of the acoustic input vector as conditionally independent given the hidden state. The mean-covariance restricted Boltzmann machine (mcRBM), first introduced for modeling natural images, is a much more representationally efficient and powerful way of modeling the covariance structure of speech data. Every configuration of the precision units of the mcRBM specifies a different precision matrix for the conditional distribution over the acoustic space. In this work, we use the mcRBM to learn features of speech data that serve as input into a standard DBN. The mcRBM features combined with DBNs allow us to achieve a phone error rate of 20.5\%, which is superior to all published results on speaker-independent TIMIT to date.

Cite

Text

Dahl et al. "Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine." Neural Information Processing Systems, 2010.

Markdown

[Dahl et al. "Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine." Neural Information Processing Systems, 2010.](https://mlanthology.org/neurips/2010/dahl2010neurips-phone/)

BibTeX

@inproceedings{dahl2010neurips-phone,
  title     = {{Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine}},
  author    = {Dahl, George and Ranzato, Marc'aurelio and Mohamed, Abdel-rahman and Hinton, Geoffrey E.},
  booktitle = {Neural Information Processing Systems},
  year      = {2010},
  pages     = {469-477},
  url       = {https://mlanthology.org/neurips/2010/dahl2010neurips-phone/}
}