Rational Metareasoning for Large Language Models

Abstract

Reasoning has emerged as a core technique for improving large language model (LLM) performance across various tasks by using additional inference-time compute. However, as LLMs scale in both size and usage, inference costs are becoming increasingly burdensome. How, then, might we optimize the cost-performance tradeoff of reasoning? This work introduces a novel approach based on computational models of metareasoning used in cognitive science, training LLMs to selectively use intermediate reasoning steps only when necessary. We first develop a reward function that incorporates the Value of Computation by penalizing unnecessary reasoning, then use this reward function with Expert Iteration to train the LLM. Compared to few-shot chain-of-thought prompting, our approach significantly reduces inference costs (38\% fewer tokens generated on average) without sacrificing task performance across diverse datasets.

Cite

Text

De Sabbata et al. "Rational Metareasoning for Large Language Models." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.

Markdown

[De Sabbata et al. "Rational Metareasoning for Large Language Models." NeurIPS 2024 Workshops: Sys2-Reasoning, 2024.](https://mlanthology.org/neuripsw/2024/sabbata2024neuripsw-rational-a/)

BibTeX

@inproceedings{sabbata2024neuripsw-rational-a,
  title     = {{Rational Metareasoning for Large Language Models}},
  author    = {De Sabbata, C. Nicolò and Sumers, Theodore and Griffiths, Thomas L.},
  booktitle = {NeurIPS 2024 Workshops: Sys2-Reasoning},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/sabbata2024neuripsw-rational-a/}
}