FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks
Abstract
FaceSync is an optimal linear algorithm that finds the degree of syn(cid:173) chronization between the audio and image recordings of a human speaker. Using canonical correlation, it finds the best direction to com(cid:173) bine all the audio and image data, projecting them onto a single axis. FaceSync uses Pearson's correlation to measure the degree of synchro(cid:173) nization between the audio and image data. We derive the optimal linear transform to combine the audio and visual information and describe an implementation that avoids the numerical problems caused by comput(cid:173) ing the correlation matrices.
Cite
Text
Slaney and Covell. "FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks." Neural Information Processing Systems, 2000.Markdown
[Slaney and Covell. "FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks." Neural Information Processing Systems, 2000.](https://mlanthology.org/neurips/2000/slaney2000neurips-facesync/)BibTeX
@inproceedings{slaney2000neurips-facesync,
title = {{FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks}},
author = {Slaney, Malcolm and Covell, Michele},
booktitle = {Neural Information Processing Systems},
year = {2000},
pages = {814-820},
url = {https://mlanthology.org/neurips/2000/slaney2000neurips-facesync/}
}