A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition

Abstract

Two quite different strategies for characterising mouth shapes for visual speech recognition (lipreading) are compared. The first strategy extracts the parameters required to fit an active shape model (ASM) to the outline of the lips. The second uses a feature derived from a one-dimensional multiscale spatial analysis (MSA) of the mouth region using a new processor derived from mathematical morphology and median filtering. With multispeaker trials, using image data only, the accuracy is 45% using MSA and 19% using ASM on a letters database. A digits database is simpler with accuracies of 77% and 77% respectively. These scores are significant since separate work has demonstrated that even quite low recognition accuracies in the vision channel can be combined with the audio system to give improved composite performance [16].

Cite

Text

Matthews et al. "A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition." European Conference on Computer Vision, 1998. doi:10.1007/BFB0054762

Markdown

[Matthews et al. "A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition." European Conference on Computer Vision, 1998.](https://mlanthology.org/eccv/1998/matthews1998eccv-comparison/) doi:10.1007/BFB0054762

BibTeX

@inproceedings{matthews1998eccv-comparison,
  title     = {{A Comparison of Active Shape Model and Scale Decomposition Based Features for Visual Speech Recognition}},
  author    = {Matthews, Iain A. and Bangham, J. Andrew and Harvey, Richard W. and Cox, Stephen J.},
  booktitle = {European Conference on Computer Vision},
  year      = {1998},
  pages     = {514-528},
  doi       = {10.1007/BFB0054762},
  url       = {https://mlanthology.org/eccv/1998/matthews1998eccv-comparison/}
}