Visual Emotion Representation Learning via Emotion-Aware Pre-Training

Abstract

Despite recent progress in deep learning, visual emotion recognition remains a challenging problem due to ambiguity of emotion perception, diverse concepts related to visual emotion and lack of large-scale annotated dataset. In this paper, we present a large-scale multimodal pre-training method to learn visual emotion representation by aligning emotion, object, attribute triplet with a contrastive loss. We conduct our pre-training on a large web dataset with noisy tags and fine-tune on visual emotion classification datasets. Our method achieves state-of-the-art performance for visual emotion classification.

Cite

Text

Zhang et al. "Visual Emotion Representation Learning via Emotion-Aware Pre-Training." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/234

Markdown

[Zhang et al. "Visual Emotion Representation Learning via Emotion-Aware Pre-Training." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/zhang2022ijcai-visual/) doi:10.24963/IJCAI.2022/234

BibTeX

@inproceedings{zhang2022ijcai-visual,
  title     = {{Visual Emotion Representation Learning via Emotion-Aware Pre-Training}},
  author    = {Zhang, Yue and Ding, Wanying and Xu, Ran and Hu, Xiaohua},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2022},
  pages     = {1679-1685},
  doi       = {10.24963/IJCAI.2022/234},
  url       = {https://mlanthology.org/ijcai/2022/zhang2022ijcai-visual/}
}