FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos

Abstract

Given a pair of facial performance videos, we present a deep learning-based approach that can automatically return a synchronized version of these videos. Traditional methods require precise facial landmark tracking and/or clean audio, and thus are sensitive to tracking inaccuracies and audio noise. To alleviate these issues, our approach leverages large-scale video datasets along with their associated audio tracks and trains a deep learning network to learn the audio descriptors of a given video frame. We then use these descriptors to compute the similarity between video frames in a cost matrix and compute a low-cost non-linear synchronization path. Both quantitative and qualitative evaluations have shown that our approach outperforms existing state-of-the-art methods.

Cite

Text

Cho et al. "FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00458

Markdown

[Cho et al. "FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/cho2019iccvw-facesyncnet/) doi:10.1109/ICCVW.2019.00458

BibTeX

@inproceedings{cho2019iccvw-facesyncnet,
  title     = {{FaceSyncNet: A Deep Learning-Based Approach for Non-Linear Synchronization of Facial Performance Videos}},
  author    = {Cho, Yoonjae and Kim, Dohyeong and Truman, Edwin and Bazin, Jean-Charles},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {3703-3707},
  doi       = {10.1109/ICCVW.2019.00458},
  url       = {https://mlanthology.org/iccvw/2019/cho2019iccvw-facesyncnet/}
}