MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations
Abstract
Existing benchmarks for multi-hop question answering (QA) primarily evaluate models based on their ability to reason about entities and the relationships between them. However, there's a lack of insight into how these models perform in terms of both events and entities. In this paper, we introduce a novel semi-automatic question generation strategy by composing event structures from information extraction (IE) datasets and present the first Multi-hop Event-centric Question Answering (MEQA) benchmark. It contains (1) 2,243 challenging questions that require a diverse range of complex reasoning over entity-entity, entity-event, and event-event relations; (2) corresponding multi-step QA-format event reasoning chain (explanation) which leads to the answer for each question. We also introduce two metrics for evaluating explanations: completeness and logical consistency. We conduct comprehensive benchmarking and analysis, which shows that MEQA is challenging for the latest state-of-the-art models encompassing large language models (LLMs); and how they fall short of providing faithful explanations of the event-centric reasoning process.
Cite
Text
Li et al. "MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations." Neural Information Processing Systems, 2024. doi:10.52202/079017-4028Markdown
[Li et al. "MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/li2024neurips-meqa/) doi:10.52202/079017-4028BibTeX
@inproceedings{li2024neurips-meqa,
title = {{MEQA: A Benchmark for Multi-Hop Event-Centric Question Answering with Explanations}},
author = {Li, Ruosen and Wang, Zimu and Tran, Son Quoc and Xia, Lei and Du, Xinya},
booktitle = {Neural Information Processing Systems},
year = {2024},
doi = {10.52202/079017-4028},
url = {https://mlanthology.org/neurips/2024/li2024neurips-meqa/}
}