ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion

Abstract

In this paper, we present our approach to tackling the 5th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). The competition comprises four sub-challenges, namely Valence-Arousal (VA) Estimation, Expression (Expr) Classification, Action Unit (AU) Detection, and Emotional Reaction Intensity (ERI) Estimation. To address theμse challenges, we leverage state-of-the-art (sota) models to extract robust audio and visual features. Subsequently, these features are fused using a Transformer Encoder for the VA, Expr, and AU sub-challenges, and TEMMA for the ERI sub-challenge. To mitigate the effect of disparate feature dimensions, we introduce an Affine Module to align the features to the same dimension. Overall, our results outperform the baseline by a substantial margin across all four sub-challenges. Specifically, for the VA Estimation sub-challenge, our method attains a mean Concordance Correlation Coefficient (CCC) of 0.5342, ranking fifth overall. For the Expression Classification sub-challenge, our approach achieves an average F1 Score of 0.3337, placing fourth overall. For the AU Detection sub-challenge, our method obtains an average F1 Score of 0.4752. Lastly, for the Emotional Reaction Intensity Estimation sub-challenge, our approach yields an average Pearson’s correlation coefficient of 0.3968.

Cite

Text

Zhang et al. "ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023. doi:10.1109/CVPRW59228.2023.00607

Markdown

[Zhang et al. "ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2023.](https://mlanthology.org/cvprw/2023/zhang2023cvprw-abaw5/) doi:10.1109/CVPRW59228.2023.00607

BibTeX

@inproceedings{zhang2023cvprw-abaw5,
  title     = {{ABAW5 Challenge: A Facial Affect Recognition Approach Utilizing Transformer Encoder and Audiovisual Fusion}},
  author    = {Zhang, Ziyang and An, Liuwei and Cui, Zishun and Xu, Ao and Dong, Tengteng and Jiang, Yueqi and Shi, Jingyi and Liu, Xin and Sun, Xiao and Wang, Meng},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2023},
  pages     = {5725-5734},
  doi       = {10.1109/CVPRW59228.2023.00607},
  url       = {https://mlanthology.org/cvprw/2023/zhang2023cvprw-abaw5/}
}