MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations

Abstract

Existing benchmarks for multi-hop question answering (QA) primarily evaluate models based on their ability to reason about entities and the relationships between them. However, there's a lack of insight into how these models perform in terms of both events and entities. In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. It contains (1) 2,243 challenging questions that require a diverse range of complex reasoning over entity-entity, entity-event, and event-event relations; (2) corresponding multi-step QA-format event reasoning chain (explanation) which leads to the answer for each question. We also introduce two metrics for evaluating explanations: completeness and logical consistency. We conduct comprehensive benchmarking and analysis, which shows that MEQA is challenging for the latest state-of-the-art models encompassing large language models (LLMs); and how they fall short of providing faithful explanations of the event-centric reasoning process.

Cite

Text

Li et al. "MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations." Neural Information Processing Systems, 2024. doi:10.52202/079017-4028

Markdown

[Li et al. "MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/li2024neurips-meqa/) doi:10.52202/079017-4028

BibTeX

@inproceedings{li2024neurips-meqa,
  title     = {{MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations}},
  author    = {Li, Ruosen and Wang, Zimu and Tran, Son Quoc and Xia, Lei and Du, Xinya},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-4028},
  url       = {https://mlanthology.org/neurips/2024/li2024neurips-meqa/}
}