MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition
Abstract
Facial action unit (AU) and basic expression recognition are two basic tasks in the area of human affective behavior analysis. Most of the existing methods are developed in restricted scenarios which are not practical for in-the-wild settings. The Affective Behavior Analysis in-the-wild (ABAW) 2021 Contest provides a benchmark for this in-the-wild problem.In this paper, we propose a multi-task and multi-modal sequence network (MTMSN) to mine the relationships between the above two different tasks and effectively utilize both visual and audio information of the video. We use both AU and expression annotations to train the model and apply a sequence model to further extract associations between video frames. We achieve an AU score of 0.7508 and an expression score of 0.7574 on the validation set.
Cite
Text
Jin et al. "MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2021. doi:10.1109/ICCVW54120.2021.00401Markdown
[Jin et al. "MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition." IEEE/CVF International Conference on Computer Vision Workshops, 2021.](https://mlanthology.org/iccvw/2021/jin2021iccvw-mtmsn/) doi:10.1109/ICCVW54120.2021.00401BibTeX
@inproceedings{jin2021iccvw-mtmsn,
title = {{MTMSN: Multi-Task and Multi-Modal Sequence Network for Facial Action Unit and Expression Recognition}},
author = {Jin, Yue and Zheng, Tianqing and Gao, Chao and Xu, Guoqiang},
booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
year = {2021},
pages = {3590-3595},
doi = {10.1109/ICCVW54120.2021.00401},
url = {https://mlanthology.org/iccvw/2021/jin2021iccvw-mtmsn/}
}