Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications
Abstract
Developments in dynamic contour tracking permit sparse representation of the outlines of moving contours. Given the increasing computing power of general-purpose workstations it is now possible to track human faces and parts of faces in real-time without special hardware. This paper describes a real-time lip tracker that uses a Kalman filter based dynamic contour to track the outline of the lips. Two alternative lip trackers, one that tracks lips from a profile view and the other from a frontal view, were developed to extract visual speech recognition features from the lip contour. In both cases, visual features have been incorporated into an acoustic automatic speech recogniser. Tests on small isolated-word vocabularies using a dynamic time warping based audio-visual recogniser demonstrate that real-time, contour-based lip tracking can be used to supplement acoustic-only speech recognisers enabling robust recognition of speech in the presence of acoustic noise.
Cite
Text
Kaucic et al. "Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications." European Conference on Computer Vision, 1996. doi:10.1007/3-540-61123-1_154Markdown
[Kaucic et al. "Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications." European Conference on Computer Vision, 1996.](https://mlanthology.org/eccv/1996/kaucic1996eccv-real/) doi:10.1007/3-540-61123-1_154BibTeX
@inproceedings{kaucic1996eccv-real,
title = {{Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications}},
author = {Kaucic, Robert and Dalton, Barney and Blake, Andrew},
booktitle = {European Conference on Computer Vision},
year = {1996},
pages = {376-387},
doi = {10.1007/3-540-61123-1_154},
url = {https://mlanthology.org/eccv/1996/kaucic1996eccv-real/}
}