McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension

Abstract

The performance of various tasks of natural language processing has greatly improved with the emergence of large language models. However, there is still much room for improvement in understanding certain specific linguistic phenomena, such as Chinese idioms, which are usually composed of four characters. Chinese idioms are difficult to understand due to semantic gaps between their literal and actual meanings. Researchers have proposed the Chinese idiom reading comprehension task to examine the ability of large language models to represent and understand Chinese idioms. The task requires choosing the correct Chinese idiom from a list of candidates to complete the sentence. The current research mainly focuses on text-based idiom comprehension. Nevertheless, there are many idiom application scenarios that combine images and text, and we believe that the corresponding images are beneficial for the model's understanding of the idioms. Therefore, to address the above problems, we first construct a large-scale Multimodal Chinese Idiom Reading Comprehension dataset (MChIRC), which contains a total of 44,433 image-text pairs covering 2,926 idioms. Then, we propose a Dual-Contrastive Idiom Graph Network (DCIGN), which employs a dual-contrastive learning module to align the text and image features corresponding to the same Chinese idiom at both coarse and fine levels, while utilizing a graph structure to capture the semantic relationships between idiom candidates. Finally, we use a cross-attention module to fuse multimodal features with graph features of candidate idioms to predict correct answers. The authoritativeness of MChIRC and the effectiveness of DCIGN are demonstrated through a variety of experiments, which provides a new benchmark for the multimodal Chinese idiom reading comprehension task.

Cite

Text

Wang et al. "McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I24.34728

Markdown

[Wang et al. "McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-mchirc/) doi:10.1609/AAAI.V39I24.34728

BibTeX

@inproceedings{wang2025aaai-mchirc,
  title     = {{McHirc: A Multimodal Benchmark for Chinese Idiom Reading Comprehension}},
  author    = {Wang, Tongguan and Wu, Mingmin and Su, Guixin and Su, Dongyu and Hu, Yuxue and Huang, Zhongqiang and Sha, Ying},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {25398-25406},
  doi       = {10.1609/AAAI.V39I24.34728},
  url       = {https://mlanthology.org/aaai/2025/wang2025aaai-mchirc/}
}