ABAW7 Challenge: A Facial Affect Recognition Approach Based on Transformer Encoder and Multilayer Perceptron

Liu, Xuxiong; Shen, Kang; Yao, Jun; Wang, Boyan; Wang, Yu; Guan, Yujie; Liu, Xin; Li, Gengchen; An, Liuwei; Cui, Zishun; Liu, Minrui; Sun, Xiao; Feng, Weijie

doi:10.1007/978-3-031-91581-9_19

ABAW7 Challenge: A Facial Affect Recognition Approach Based on Transformer Encoder and Multilayer Perceptron

Xuxiong Liu, Kang Shen, Jun Yao, Boyan Wang, Yu Wang, Yujie Guan, Xin Liu, Gengchen Li, Liuwei An, Zishun Cui, Minrui Liu, Xiao Sun, Weijie Feng

ECCVW 2024 pp. 267-281

doi:10.1007/978-3-031-91581-9_19 /eccvw/2024/liu2024eccvw-abaw7/

Abstract

In this paper, we present our solution for the 7th Affective Behavior Analysis and Recognition in-the-Wild (ABAW) competition, which encompasses two sub-challenges: Multi-Task Learning and compound Facial Expression Recognition. The Multi-Task Learning involves the triad of Valence Arousal (VA), Action Units (AU), and facial expressions. The focus of the 7th ABAW competition is on facial expression recognition datasets based on different modalities. In our work, we utilize a plethora of models for visual feature extraction. The Transformer Encoder integrates these features. Additionally, to counteract the potential impact of significant dimensional disparities among various features, we have designed an affine module to align different features to the same dimension. For compound expressions, we propose an ensemble learning-based solution to address this complexity. Our approach involves training two distinct facial expression classification models using convolutional networks and visual transformers. By employing late fusion for model integration, we combine the outputs of these models to predict the final outcomes. Extensive experiments have demonstrated the superiority of our proposed method. For the Multi-Task challenge, we achieved a total score of 1.1776, with an F1 of 0.4997 for AU, a CCC of 0.378 for VA, and an F1 of 0.2997 for EXPR. In the compound Expression sub-challenge, the F1 score was 0.228. The results for both sub-challenges significantly outperform the baselines.

PDF ECCVW Semantic Scholar

Cite

Text

Liu et al. "ABAW7 Challenge: A Facial Affect Recognition Approach Based on Transformer Encoder and Multilayer Perceptron." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91581-9_19

Markdown

[Liu et al. "ABAW7 Challenge: A Facial Affect Recognition Approach Based on Transformer Encoder and Multilayer Perceptron." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/liu2024eccvw-abaw7/) doi:10.1007/978-3-031-91581-9_19

BibTeX

@inproceedings{liu2024eccvw-abaw7,
  title     = {{ABAW7 Challenge: A Facial Affect Recognition Approach Based on Transformer Encoder and Multilayer Perceptron}},
  author    = {Liu, Xuxiong and Shen, Kang and Yao, Jun and Wang, Boyan and Wang, Yu and Guan, Yujie and Liu, Xin and Li, Gengchen and An, Liuwei and Cui, Zishun and Liu, Minrui and Sun, Xiao and Feng, Weijie},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {267-281},
  doi       = {10.1007/978-3-031-91581-9_19},
  url       = {https://mlanthology.org/eccvw/2024/liu2024eccvw-abaw7/}
}