Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss

Abstract

We present our winning submission to the First International Workshop on Bodily Expressed Emotion Understanding (BEEU) challenge. Based on recent literature on the effect of context/environment on emotion, as well as visual representations with semantic meaning using word embeddings, we extend the framework of Temporal Segment Network to accommodate these. Our method is verified on the validation set of the Body Language Dataset (BoLD) and achieves 0.26235 Emotion Recognition Score on the test set, surpassing the previous best result of 0.2530.

Cite

Text

Filntisis et al. "Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-66415-2_52

Markdown

[Filntisis et al. "Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/filntisis2020eccvw-emotion/) doi:10.1007/978-3-030-66415-2_52

BibTeX

@inproceedings{filntisis2020eccvw-emotion,
  title     = {{Emotion Understanding in Videos Through Body, Context, and Visual-Semantic Embedding Loss}},
  author    = {Filntisis, Panagiotis Paraskevas and Efthymiou, Niki and Potamianos, Gerasimos and Maragos, Petros},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2020},
  pages     = {747-755},
  doi       = {10.1007/978-3-030-66415-2_52},
  url       = {https://mlanthology.org/eccvw/2020/filntisis2020eccvw-emotion/}
}