Dynamic Features for Visual Speechreading: A Systematic Comparison

Abstract

Humans use visual as well as auditory speech signals to recognize spoken words. A variety of systems have been investigated for per(cid:173) forming this task. The main purpose of this research was to sys(cid:173) tematically compare the performance of a range of dynamic visual features on a speechreading task. We have found that normal(cid:173) ization of images to eliminate variation due to translation, scale, and planar rotation yielded substantial improvements in general(cid:173) ization performance regardless of the visual representation used. In addition, the dynamic information in the difference between suc(cid:173) cessive frames yielded better performance than optical-flow based approaches, and compression by local low-pass filtering worked sur(cid:173) prisingly better than global principal components analysis (PCA). These results are examined and possible explanations are explored.

Cite

Text

Gray et al. "Dynamic Features for Visual Speechreading: A Systematic Comparison." Neural Information Processing Systems, 1996.

Markdown

[Gray et al. "Dynamic Features for Visual Speechreading: A Systematic Comparison." Neural Information Processing Systems, 1996.](https://mlanthology.org/neurips/1996/gray1996neurips-dynamic/)

BibTeX

@inproceedings{gray1996neurips-dynamic,
  title     = {{Dynamic Features for Visual Speechreading: A Systematic Comparison}},
  author    = {Gray, Michael S. and Movellan, Javier R. and Sejnowski, Terrence J.},
  booktitle = {Neural Information Processing Systems},
  year      = {1996},
  pages     = {751-757},
  url       = {https://mlanthology.org/neurips/1996/gray1996neurips-dynamic/}
}