An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
Abstract
This paper presents a novel Hidden Markov Model architecture to model the joint probability of pairs of asynchronous sequences de(cid:173) scribing the same event. It is based on two other Markovian models, namely Asynchronous Input/ Output Hidden Markov Models and Pair Hidden Markov Models. An EM algorithm to train the model is presented, as well as a Viterbi decoder that can be used to ob(cid:173) tain the optimal state sequence as well as the alignment between the two sequences. The model has been tested on an audio-visual speech recognition task using the M2VTS database and yielded robust performances under various noise conditions.
Cite
Text
Bengio. "An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition." Neural Information Processing Systems, 2002.Markdown
[Bengio. "An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition." Neural Information Processing Systems, 2002.](https://mlanthology.org/neurips/2002/bengio2002neurips-asynchronous/)BibTeX
@inproceedings{bengio2002neurips-asynchronous,
title = {{An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition}},
author = {Bengio, Samy},
booktitle = {Neural Information Processing Systems},
year = {2002},
pages = {1237-1244},
url = {https://mlanthology.org/neurips/2002/bengio2002neurips-asynchronous/}
}