Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)

Abstract

In our research, we propose a new multimodal fusion architecture for the task of sentiment analysis. The 3 modalities used in this paper are text, audio and video. Most of the current methods deal with either a feature level or a decision level fusion. In contrast, we propose an attention-based deep neural network and a training approach to facilitate both feature and decision level fusion. Our network effectively leverages information across all three modalities using a 2 stage fusion process. We test our network on the individual utterance based contextual information extracted from the CMU-MOSI Dataset. A comparison is drawn between the state-of-the-art and our network.

Cite

Text

Harish and Sadat. "Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)." AAAI Conference on Artificial Intelligence, 2020. doi:10.1609/AAAI.V34I10.7173

Markdown

[Harish and Sadat. "Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)." AAAI Conference on Artificial Intelligence, 2020.](https://mlanthology.org/aaai/2020/harish2020aaai-trimodal/) doi:10.1609/AAAI.V34I10.7173

BibTeX

@inproceedings{harish2020aaai-trimodal,
  title     = {{Trimodal Attention Module for Multimodal Sentiment Analysis (Student Abstract)}},
  author    = {Harish, Anirudh Bindiganavale and Sadat, Fatiha},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2020},
  pages     = {13803-13804},
  doi       = {10.1609/AAAI.V34I10.7173},
  url       = {https://mlanthology.org/aaai/2020/harish2020aaai-trimodal/}
}