Targeted Multimodal Sentiment Classification Based on Coarse-to-Fine Grained Image-Target Matching
Abstract
Targeted Multimodal Sentiment Classification (TMSC) aims to identify the sentiment polarities over each target mentioned in a pair of sentence and image. Existing methods to TMSC failed to explicitly capture both coarse-grained and fine-grained image-target matching, including 1) the relevance between the image and the target and 2) the alignment between visual objects and the target. To tackle this issue, we propose a new multi-task learning architecture named coarse-to-fine grained Image-Target Matching network (ITM), which jointly performs image-target relevance classification, object-target alignment, and targeted sentiment classification. We further construct an Image-Target Matching dataset by manually annotating the image-target relevance and the visual object aligned with the input target. Experiments on two benchmark TMSC datasets show that our model consistently outperforms the baselines, achieves state-of-the-art results, and presents interpretable visualizations.
Cite
Text
Yu et al. "Targeted Multimodal Sentiment Classification Based on Coarse-to-Fine Grained Image-Target Matching." International Joint Conference on Artificial Intelligence, 2022. doi:10.24963/IJCAI.2022/622Markdown
[Yu et al. "Targeted Multimodal Sentiment Classification Based on Coarse-to-Fine Grained Image-Target Matching." International Joint Conference on Artificial Intelligence, 2022.](https://mlanthology.org/ijcai/2022/yu2022ijcai-targeted/) doi:10.24963/IJCAI.2022/622BibTeX
@inproceedings{yu2022ijcai-targeted,
title = {{Targeted Multimodal Sentiment Classification Based on Coarse-to-Fine Grained Image-Target Matching}},
author = {Yu, Jianfei and Wang, Jieming and Xia, Rui and Li, Junjie},
booktitle = {International Joint Conference on Artificial Intelligence},
year = {2022},
pages = {4482-4488},
doi = {10.24963/IJCAI.2022/622},
url = {https://mlanthology.org/ijcai/2022/yu2022ijcai-targeted/}
}