Rational Metareasoning for Large Language Models

Abstract

Reasoning has emerged as a core technique for improving large language model (LLM) performance across various tasks by using additional inference-time compute. However, as LLMs scale in both size and usage, inference costs are becoming increasingly burdensome. How, then, might we optimize the cost-performance tradeoff of reasoning? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting, our approach significantly reduces inference costs (38\% fewer tokens generated on average) without sacrificing task performance across diverse datasets.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

De Sabbata et al. "Rational Metareasoning for Large Language Models." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.

Markdown

[De Sabbata et al. "Rational Metareasoning for Large Language Models." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.](https://mlanthology.org/neuripsw/2024/sabbata2024neuripsw-rational-a/)

BibTeX

@inproceedings{sabbata2024neuripsw-rational-a,
  title     = {{Rational Metareasoning for Large Language Models}},
  author    = {De Sabbata, C. Nicolò and Sumers, Theodore and Griffiths, Thomas L.},
  booktitle = {NeurIPS 2024 Workshops: Sys2-Reasoning},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/sabbata2024neuripsw-rational-a/}
}