FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos
Abstract
Given a pair of facial performance videos, we present a deep learning-based approach that can automatically return a synchronized version of these videos. Traditional methods require precise facial landmark tracking and/or clean audio, and thus are sensitive to tracking inaccuracies and audio noise. To alleviate these issues, our approach leverages large-scale video datasets along with their associated audio tracks and trains a deep learning network to learn the audio descriptors of a given video frame. We then use these descriptors to compute the similarity between video frames in a cost matrix and compute a low-cost non-linear synchronization path. Both quantitative and qualitative evaluations have shown that our approach outperforms existing state-of-the-art methods.
Cite
Text
Cho et al. "FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00458Markdown
[Cho et al. "FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/cho2019iccvw-facesyncnet/) doi:10.1109/ICCVW.2019.00458BibTeX
@inproceedings{cho2019iccvw-facesyncnet,
title = {{FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos}},
author = {Cho, Yoonjae and Kim, Dohyeong and Truman, Edwin and Bazin, Jean-Charles},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2019},
pages = {3703-3707},
doi = {10.1109/ICCVW.2019.00458},
url = {https://mlanthology.org/iccvw/2019/cho2019iccvw-facesyncnet/}
}