Estimation of Affective Level in the Wild with Multiple Memory Networks

Abstract

This paper presents the proposed solution to the "affect in the wild" challenge, which aims to estimate the affective level, i.e. the valence and arousal values, of every frame in a video. A carefully designed deep convolutional neural network (a variation of residual network) for affective level estimation of facial expressions is first implemented as a baseline. Next we use multiple memory networks to model the temporal relations between the frames. Finally ensemble models are used to combine the predictions from multiple memory networks. Our proposed solution outperforms the baseline model by a factor of 10.62% in terms of mean square error (MSE).

Cite

Text

Li et al. "Estimation of Affective Level in the Wild with Multiple Memory Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017. doi:10.1109/CVPRW.2017.244

Markdown

[Li et al. "Estimation of Affective Level in the Wild with Multiple Memory Networks." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2017.](https://mlanthology.org/cvprw/2017/li2017cvprw-estimation/) doi:10.1109/CVPRW.2017.244

BibTeX

@inproceedings{li2017cvprw-estimation,
  title     = {{Estimation of Affective Level in the Wild with Multiple Memory Networks}},
  author    = {Li, Jianshu and Chen, Yunpeng and Xiao, Shengtao and Zhao, Jian and Roy, Sujoy and Feng, Jiashi and Yan, Shuicheng and Sim, Terence},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2017},
  pages     = {1947-1954},
  doi       = {10.1109/CVPRW.2017.244},
  url       = {https://mlanthology.org/cvprw/2017/li2017cvprw-estimation/}
}