Everybody Dance Now

Abstract

This paper presents a simple method for "do as I do" motion transfer: given a source video of a person dancing, we can transfer that performance to a novel (amateur) target after only a few minutes of the target subject performing standard moves. We approach this problem as video-to-video translation using pose as an intermediate representation. To transfer the motion, we extract poses from the source subject and apply the learned pose-to-appearance mapping to generate the target subject. We predict two consecutive frames for temporally coherent video results and introduce a separate pipeline for realistic face synthesis. Although our method is quite simple, it produces surprisingly compelling results (see video). This motivates us to also provide a forensics tool for reliable synthetic content detection, which is able to distinguish videos synthesized by our system from real data. In addition, we release a first-of-its-kind open-source dataset of videos that can be legally used for training and motion transfer.

Cite

Text

Chan et al. "Everybody Dance Now." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019. doi:10.1109/ICCV.2019.00603

Markdown

[Chan et al. "Everybody Dance Now." Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.](https://mlanthology.org/iccv/2019/chan2019iccv-everybody/) doi:10.1109/ICCV.2019.00603

BibTeX

@inproceedings{chan2019iccv-everybody,
  title     = {{Everybody Dance Now}},
  author    = {Chan, Caroline and Ginosar, Shiry and Zhou, Tinghui and Efros, Alexei A.},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year      = {2019},
  doi       = {10.1109/ICCV.2019.00603},
  url       = {https://mlanthology.org/iccv/2019/chan2019iccv-everybody/}
}