Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters
Abstract
Chain-of-Thought (CoT) prompting, which encourages language models (LMs) to generate intermediate rationales for the final answer through in-context demonstrations, dramatically improves large LMs' ability to solve reasoning tasks. Despite its success, there is little understanding on what makes CoT prompting effective and which aspects of the demonstrated reasoning steps contribute to its performance. In this paper, we show that prompting with invalid demonstrations affects little in CoT reasoning, achieving over 80-90% of the performance obtained using the original CoT under various metrics, while still generating coherent lines of reasoning during inference. Further experiments show that other aspects of the rationales, such as being relevant to the query and correctly ordering the reasoning steps, are the actual key to the effectiveness of CoT. Overall, these findings deepen our understanding of CoT prompting, while leading to new questions regarding large LMs’ capability to learn to reason in context and reflections on benchmarking few-shot reasoning.
Cite
Text
Wang et al. "Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters." ICLR 2023 Workshops: ME-FoMo, 2023.Markdown
[Wang et al. "Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters." ICLR 2023 Workshops: ME-FoMo, 2023.](https://mlanthology.org/iclrw/2023/wang2023iclrw-understanding/)BibTeX
@inproceedings{wang2023iclrw-understanding,
title = {{Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters}},
author = {Wang, Boshi and Min, Sewon and Deng, Xiang and Shen, Jiaming and Wu, You and Zettlemoyer, Luke and Sun, Huan},
booktitle = {ICLR 2023 Workshops: ME-FoMo},
year = {2023},
url = {https://mlanthology.org/iclrw/2023/wang2023iclrw-understanding/}
}