Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining

Abstract

Video-based Compound Expression Recognition (CER) aims to identify compound expressions in everyday interactions per frame. Unlike rapid progress in Facial Expression Recognition (FER) for the basic emotions (e.g., surprised, sad, and fearful), CER with the compound emotions (e.g., fearfully surprised, and sadly fearful) remains under-explored, with an evident gap in the availability of substantial datasets. In this paper, we design a framework to demonstrate the feasibility of predicting compound expressions in-the-wild without relying on domain-specific supervision. To be specific, we first train a model on a large-scale facial dataset using the Masked Autoencoder (MAE) approach to learn comprehensive facial features. Then, to tailor it for facial expression analysis, we fine-tune the ViT encoder on an Action Unit (AU) detection task. To address the issue of insufficient data, we transform the task of recognizing compound emotions into a multi-label recognition task for basic emotions. We train a network by finetuning the pretrained ViT encoder to predict the probability of each basic emotion, and then combine these probabilities to arrive at the final prediction for the compound emotions. Experiments conducted on the C-EXPR-DB dataset demonstrate the effectiveness of our framework in the frame-by-frame prediction of compound expressions in-the-wild. Our framework is recognized as the leading solution in the Compound Expression (CE) Recognition Challenge in the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild (ABAW). More information for the Competition can be found in: 6th ABAW.

Cite

Text

Qiu et al. "Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00476

Markdown

[Qiu et al. "Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/qiu2024cvprw-learning/) doi:10.1109/CVPRW63382.2024.00476

BibTeX

@inproceedings{qiu2024cvprw-learning,
  title     = {{Learning Transferable Compound Expressions from Masked AutoEncoder Pretraining}},
  author    = {Qiu, Feng and Du, Heming and Zhang, Wei and Liu, Chen and Li, Lincheng and Guo, Tianchen and Yu, Xin},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {4733-4741},
  doi       = {10.1109/CVPRW63382.2024.00476},
  url       = {https://mlanthology.org/cvprw/2024/qiu2024cvprw-learning/}
}