Predicting Emotions in Interpersonal Interaction Videos: I Know What You Feel
Abstract
Predicting emotional transitions in video content is crucial for applications such as multimedia indexing, affective computing, and human-computer interaction. In this paper, we introduce a novel approach aimed at predicting the emotion of the next shot in a video clip by capturing the temporal dynamics of emotional changes. To support this research, we used a new dataset called Interpersonal relations with Multi-Emotions, IMEmo , which encompasses a diverse range of emotional expressions and transitions. Our proposed method processes video segments through a neural network architecture, combining visual features and emotion embeddings to model the progression of emotions over time. Specifically, we employ a state-of-the-art network (DAN) to extract visual features, which are then enhanced with emotion feature vectors derived from corresponding emotion labels. These combined features are provided to a Long Short-Term Memory (LSTM) network that captures the temporal dependencies and emotional context across the video. The final output of the LSTM is classified into one of 16 distinct emotions, 6 basic emotions and sentiment, predicting the emotion of the next shot. Experiments on the IMEmo dataset demonstrate the effectiveness of our approach in forecasting future emotional states, showcasing its potential to advance emotion recognition in video content. This method provides a first comprehensive understanding of emotional dynamics, offering a preliminary improvement over existing techniques.
Cite
Text
Guerdelli et al. "Predicting Emotions in Interpersonal Interaction Videos: I Know What You Feel." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-91581-9_16Markdown
[Guerdelli et al. "Predicting Emotions in Interpersonal Interaction Videos: I Know What You Feel." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/guerdelli2024eccvw-predicting/) doi:10.1007/978-3-031-91581-9_16BibTeX
@inproceedings{guerdelli2024eccvw-predicting,
title = {{Predicting Emotions in Interpersonal Interaction Videos: I Know What You Feel}},
author = {Guerdelli, Hajer and Ferrari, Claudio and Berretti, Stefano and Barhoumi, Walid and Del Bimbo, Alberto},
booktitle = {European Conference on Computer Vision Workshops},
year = {2024},
pages = {229-243},
doi = {10.1007/978-3-031-91581-9_16},
url = {https://mlanthology.org/eccvw/2024/guerdelli2024eccvw-predicting/}
}