Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction

Abstract

In this research, we introduce a novel methodology for assessing Emotional Mimicry Intensity (EMI) as part of the 6th Workshop and Competition on Affective Behavior Analysis in-the-wild. Our methodology utilises the Wav2Vec 2.0 architecture, which has been pre-trained on an extensive podcast dataset, to capture a wide array of audio features that include both linguistic and paralinguistic components. We refine our feature extraction process by employing a fusion technique that combines individual features with a global mean vector, thereby embedding a broader contextual understanding into our analysis. A key aspect of our approach is the multi-task fusion strategy that not only leverages these features but also incorporates a pre-trained Valence-Arousal-Dominance (VAD) model. This integration is designed to refine emotion intensity prediction by concurrently processing multiple emotional dimensions, thereby embedding a richer contextual understanding into our framework. For the temporal analysis of audio data, our feature fusion process utilises a Long Short-Term Memory (LSTM) network. This approach, which relies solely on the provided audio data, shows marked advancements over the existing baseline, offering a more comprehensive understanding of emotional mimicry in naturalistic settings, achieving the second place in the EMI challenge.

Cite

Text

Hallmen et al. "Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024. doi:10.1109/CVPRW63382.2024.00468

Markdown

[Hallmen et al. "Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction." IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024.](https://mlanthology.org/cvprw/2024/hallmen2024cvprw-unimodal/) doi:10.1109/CVPRW63382.2024.00468

BibTeX

@inproceedings{hallmen2024cvprw-unimodal,
  title     = {{Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction}},
  author    = {Hallmen, Tobias and Deuser, Fabian and Oswald, Norbert and André, Elisabeth},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops},
  year      = {2024},
  pages     = {4657-4665},
  doi       = {10.1109/CVPRW63382.2024.00468},
  url       = {https://mlanthology.org/cvprw/2024/hallmen2024cvprw-unimodal/}
}