Unsupervised Deep Representations for Learning Audience Facial Behaviors

Abstract

In this paper, we present an unsupervised learning approach for analyzing facial behavior based on a deep generative model combined with a convolutional neural network (CNN). We jointly train a variational auto-encoder (VAE) and a generative adversarial network (GAN) to learn a powerful latent representation from footage of audiences viewing feature-length movies. We show that the learned latent representation successfully encodes meaningful signatures of behaviors related to audience engagement (smiling & laughing) and disengagement (yawning). Our results provide a proof of concept for a more general methodology for annotating hard-to-label multimedia data featuring sparse examples of signals of interest.

Cite

Text

Saha et al. "Unsupervised Deep Representations for Learning Audience Facial Behaviors." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.

Markdown

[Saha et al. "Unsupervised Deep Representations for Learning Audience Facial Behaviors." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2018.](https://mlanthology.org/cvprw/2018/saha2018cvprw-unsupervised/)

BibTeX

@inproceedings{saha2018cvprw-unsupervised,
  title     = {{Unsupervised Deep Representations for Learning Audience Facial Behaviors}},
  author    = {Saha, Suman and Navarathna, Rajitha and Helminger, Leonhard and Weber, Romann M.},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2018},
  pages     = {1132-1137},
  url       = {https://mlanthology.org/cvprw/2018/saha2018cvprw-unsupervised/}
}