Street: A Multi-Task Structured Reasoning and Explanation Benchmark
Abstract
We introduce STREET, a unified multi-task and multi-domain natural language reasoning and explanation benchmark. Unlike most existing question-answering (QA) datasets, we expect models to not only answer questions, but also produce step-by-step structured explanations describing how premises in the question are used to produce intermediate conclusions that can prove the correctness of a certain answer. We perform extensive evaluation with popular language models such as few-shot prompting GPT-3 and fine-tuned T5. We find that these models still lag behind human performance when producing such structured reasoning steps. We believe this work will provide a way for the community to better train and test systems on multi-step reasoning and explanations in natural language.
Cite
Text
Ribeiro et al. "Street: A Multi-Task Structured Reasoning and Explanation Benchmark." International Conference on Learning Representations, 2023.Markdown
[Ribeiro et al. "Street: A Multi-Task Structured Reasoning and Explanation Benchmark." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/ribeiro2023iclr-street/)BibTeX
@inproceedings{ribeiro2023iclr-street,
title = {{Street: A Multi-Task Structured Reasoning and Explanation Benchmark}},
author = {Ribeiro, Danilo Neves and Wang, Shen and Ma, Xiaofei and Zhu, Henghui and Dong, Rui and Kong, Deguang and Burger, Juliette and Ramos, Anjelica and Huang, Zhiheng and Wang, William Yang and Karypis, George and Xiang, Bing and Roth, Dan},
booktitle = {International Conference on Learning Representations},
year = {2023},
url = {https://mlanthology.org/iclr/2023/ribeiro2023iclr-street/}
}