An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition

Abstract

This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences de(cid:173) scribing the same event. It is based on two other Markovian models, namely Asynchronous Input/ Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to ob(cid:173) tain the optimal state sequence as well as the alignment between the two sequences. The model has been tested on an audio-visual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions.

Cite

Text

Bengio. "An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition." Neural Information Processing Systems, 2002.

Markdown

[Bengio. "An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/bengio2002neurips-asynchronous/)

BibTeX

@inproceedings{bengio2002neurips-asynchronous,
  title     = {{An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition}},
  author    = {Bengio, Samy},
  booktitle = {Neural Information Processing Systems},
  year      = {2002},
  pages     = {1237-1244},
  url       = {https://mlanthology.org/neurips/2002/bengio2002neurips-asynchronous/}
}