Predicting Emotions in User-Generated Videos

Abstract

User-generated video collections are expanding rapidly in recent years, and systems for automatic analysis of these collections are in high demands. While extensive research efforts have been devoted to recognizing semantics like "birthday party" and "skiing", little attempts have been made to understand the emotions carried by the videos, e.g., "joy" and "sadness". In this paper, we propose a comprehensive computational framework for predicting emotions in user-generated videos. We first introduce a rigorously designed dataset collected from popular video-sharing websites with manual annotations, which can serve as a valuable benchmark for future research. A large set of features are extracted from this dataset, ranging from popular low-level visual descriptors, audio features, to high-level semantic attributes. Results of a comprehensive set of experiments indicate that combining multiple types of features---such as the joint use of the audio and visual clues---is important, and attribute features such as those containing sentiment-level semantics are very effective.

Cite

Text

Jiang et al. "Predicting Emotions in User-Generated Videos." AAAI Conference on Artificial Intelligence, 2014. doi:10.1609/AAAI.V28I1.8724

Markdown

[Jiang et al. "Predicting Emotions in User-Generated Videos." AAAI Conference on Artificial Intelligence, 2014.](https://mlanthology.org/aaai/2014/jiang2014aaai-predicting/) doi:10.1609/AAAI.V28I1.8724

BibTeX

@inproceedings{jiang2014aaai-predicting,
  title     = {{Predicting Emotions in User-Generated Videos}},
  author    = {Jiang, Yu-Gang and Xu, Baohan and Xue, Xiangyang},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2014},
  pages     = {73-79},
  doi       = {10.1609/AAAI.V28I1.8724},
  url       = {https://mlanthology.org/aaai/2014/jiang2014aaai-predicting/}
}