DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding

Du, Hang; Nan, Guoshun; Zhang, Sicheng; Xie, Binzhu; Xu, Junrui; Fan, Hehe; Cui, Qimei; Tao, Xiaofeng; Jiang, Xudong

doi:10.1609/AAAI.V38I16.29748

DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding

Hang Du, Guoshun Nan, Sicheng Zhang, Binzhu Xie, Junrui Xu, Hehe Fan, Qimei Cui, Xiaofeng Tao, Xudong Jiang

AAAI 2024 pp. 17933-17941

doi:10.1609/AAAI.V38I16.29748 /aaai/2024/du2024aaai-docmsu/

Abstract

Multimodal Sarcasm Understanding (MSU) has a wide range of applications in the news field such as public opinion analysis and forgery detection. However, existing MSU benchmarks and approaches usually focus on sentence-level MSU. In document-level news, sarcasm clues are sparse or small and are often concealed in long text. Moreover, compared to sentence-level comments like tweets, which mainly focus on only a few trends or hot topics (e.g., sports events), content in the news is considerably diverse. Models created for sentence-level MSU may fail to capture sarcasm clues in document-level news. To fill this gap, we present a comprehensive benchmark for Document-level Multimodal Sarcasm Understanding (DocMSU). Our dataset contains 102,588 pieces of news with text-image pairs, covering 9 diverse topics such as health, business, etc. The proposed large-scale and diverse DocMSU significantly facilitates the research of document-level MSU in real-world scenarios. To take on the new challenges posed by DocMSU, we introduce a fine-grained sarcasm comprehension method to properly align the pixel-level image features with word-level textual features in documents. Experiments demonstrate the effectiveness of our method, showing that it can serve as a baseline approach to the challenging DocMSU.

PDF AAAI Semantic Scholar

Cite

Text

Du et al. "DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I16.29748

Markdown

[Du et al. "DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/du2024aaai-docmsu/) doi:10.1609/AAAI.V38I16.29748

BibTeX

@inproceedings{du2024aaai-docmsu,
  title     = {{DocMSU: A Comprehensive Benchmark for Document-Level Multimodal Sarcasm Understanding}},
  author    = {Du, Hang and Nan, Guoshun and Zhang, Sicheng and Xie, Binzhu and Xu, Junrui and Fan, Hehe and Cui, Qimei and Tao, Xiaofeng and Jiang, Xudong},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {17933-17941},
  doi       = {10.1609/AAAI.V38I16.29748},
  url       = {https://mlanthology.org/aaai/2024/du2024aaai-docmsu/}
}