$e^3$: Exploring Embodied Emotion Through a Large-Scale Egocentric Video Dataset

Abstract

Understanding human emotions is fundamental to enhancing human-computer interaction, especially for embodied agents that mimic human behavior. Traditional emotion analysis often takes a third-person perspective, limiting the ability of agents to interact naturally and empathetically. To address this gap, this paper presents $E^3$ for Exploring Embodied Emotion, the first massive first-person view video dataset. $E^3$ contains more than $50$ hours of video, capturing $8$ different emotion types in diverse scenarios and languages. The dataset features videos recorded by individuals in their daily lives, capturing a wide range of real-world emotions conveyed through visual, acoustic, and textual modalities. By leveraging this dataset, we define $4$ core benchmark tasks - emotion recognition, emotion classification, emotion localization, and emotion reasoning - supported by more than $80$k manually crafted annotations, providing a comprehensive resource for training and evaluating emotion analysis models. We further present Emotion-LlaMa, which complements visual modality with acoustic modality to enhance the understanding of emotion in first-person videos. The results of comparison experiments with a large number of baselines demonstrate the superiority of Emotion-LlaMa and set a new benchmark for embodied emotion analysis. We expect that $E^3$ can promote advances in multimodal understanding, robotics, and augmented reality, and provide a solid foundation for the development of more empathetic and context-aware embodied agents.

Cite

Text

Lin et al. "$e^3$: Exploring Embodied Emotion Through a Large-Scale Egocentric Video Dataset." Neural Information Processing Systems, 2024. doi:10.52202/079017-3753

Markdown

[Lin et al. "$e^3$: Exploring Embodied Emotion Through a Large-Scale Egocentric Video Dataset." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/lin2024neurips-exploring/) doi:10.52202/079017-3753

BibTeX

@inproceedings{lin2024neurips-exploring,
  title     = {{$e^3$: Exploring Embodied Emotion Through a Large-Scale Egocentric Video Dataset}},
  author    = {Lin, Wang and Feng, Yueying and Han, Wenkang and Jin, Tao and Zhao, Zhou and Wu, Fei and Yao, Chang and Chen, Jingyuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3753},
  url       = {https://mlanthology.org/neurips/2024/lin2024neurips-exploring/}
}