Cascaded Siamese Self-Supervised Audio to Video GAN

Abstract

Generating meaningful videos that are synchronised to audio signals is a complex synthesis task that requires generation of not only realistic videos but also coherent video motions that conform to the provided audio signals. While tremendous effort has been expended on audio-to-video generative models, these models rely heavily on supervised signals such as face/body key points or 3D meshes. However, key point annotation requires time and effort. Besides, some dataset domains do not have predictable structure, which makes the extraction of points of interest infeasible. Our proposed model consists of a cascaded generator-discriminator architecture that works at the pixel level to generate videos according to the associated soundtracks. It adopts a new self-supervised temporal augmentation technique to optimise the correlation between the audio signal and the generated video instead of relying on supervised signals. The proposed architecture has proven its effectiveness in extensive experiments that compared different models across two datasets.

Cite

Text

Aldausari et al. "Cascaded Siamese Self-Supervised Audio to Video GAN." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022. doi:10.1109/CVPRW56347.2022.00515

Markdown

[Aldausari et al. "Cascaded Siamese Self-Supervised Audio to Video GAN." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022.](https://mlanthology.org/cvprw/2022/aldausari2022cvprw-cascaded/) doi:10.1109/CVPRW56347.2022.00515

BibTeX

@inproceedings{aldausari2022cvprw-cascaded,
  title     = {{Cascaded Siamese Self-Supervised Audio to Video GAN}},
  author    = {Aldausari, Nuha and Sowmya, Arcot and Marcus, Nadine and Mohammadi, Gelareh},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2022},
  pages     = {4690-4699},
  doi       = {10.1109/CVPRW56347.2022.00515},
  url       = {https://mlanthology.org/cvprw/2022/aldausari2022cvprw-cascaded/}
}