Facial Expression Recognition Based on Multi-Modal Features for Videos in the Wild
Abstract
This paper presents our work to the Expression Classification Challenge of the 5th Affective Behavior Analysis in-the-wild (ABAW) Competition. In our method, the multi-modal features are extracted by several different pertained models, which are used to build different combinations to capture more effective emotion information. Specifically, we extracted efficient facial expression features using MAE encoder pre-trained with a large-scale face dataset. For these combinations of visual and audio modal features, we utilize two kinds of temporal encoders to explore the temporal contextual information in the data. In addition, we employ several ensemble strategies for different experimental set-tings to obtain the most accurate expression recognition results. Our system achieves the average F1 Score of 0.4072 on the test set of Aff-wild2 ranking 2nd, which proves the effectiveness of our method.
Cite
Text
Liu et al. "Facial Expression Recognition Based on Multi-Modal Features for Videos in the Wild." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00624Markdown
[Liu et al. "Facial Expression Recognition Based on Multi-Modal Features for Videos in the Wild." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/liu2023cvprw-facial/) doi:10.1109/CVPRW59228.2023.00624BibTeX
@inproceedings{liu2023cvprw-facial,
title = {{Facial Expression Recognition Based on Multi-Modal Features for Videos in the Wild}},
author = {Liu, Chuanhe and Zhang, Xinjie and Liu, Xiaolong and Zhang, Tenggan and Meng, Liyu and Liu, Yuchen and Deng, Yuanyuan and Jiang, Wenqiang},
booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
year = {2023},
pages = {5872-5879},
doi = {10.1109/CVPRW59228.2023.00624},
url = {https://mlanthology.org/cvprw/2023/liu2023cvprw-facial/}
}