Audio-Visual Person Verification

Abstract

In this paper we investigate benefits of classifier combination (fusion) for a multimodal system for personal identity verification. The system uses frontal face images and speech. We show that a sophisticated fusion strategy enables the system to outperform its facial and vocal modules when taken seperately. We show that both trained linear weighted schemes and fusion by Support Vector Machine classifier leads to a significant reduction of total error rates. The complete system is tested on data from a publicly available audio-visual database (XM2VTS, 295 subjects) according to a published protocol.

Cite

Text

Ben-Yacoub et al. "Audio-Visual Person Verification." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1999. doi:10.1109/CVPR.1999.786997

Markdown

[Ben-Yacoub et al. "Audio-Visual Person Verification." IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1999.](https://mlanthology.org/cvpr/1999/benyacoub1999cvpr-audio/) doi:10.1109/CVPR.1999.786997

BibTeX

@inproceedings{benyacoub1999cvpr-audio,
  title     = {{Audio-Visual Person Verification}},
  author    = {Ben-Yacoub, S. and Luettin, Juergen and Jonsson, Kenneth and Matas, Jiri and Kittler, Josef},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year      = {1999},
  pages     = {1580-1585},
  doi       = {10.1109/CVPR.1999.786997},
  url       = {https://mlanthology.org/cvpr/1999/benyacoub1999cvpr-audio/}
}