Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving

Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, Yiming Yang

NeurIPSW 2024

/neuripsw/2024/wu2024neuripsw-inference/

Abstract

While the scaling laws of large language models (LLMs) training have been extensively studied, optimal inference configurations of LLMs remain underexplored. We study inference scaling laws and compute-optimal inference, focusing on the trade-offs between model sizes and generating additional tokens with different inference strategies. As a first step towards understanding and designing compute-optimal inference methods, we studied cost-performance trade-offs for inference strategies such as greedy search, majority voting, best-of-$n$, weighted voting, and two different tree search algorithms, using different model sizes and compute budgets. Our findings indicate smaller models (e.g., Llemma-7B) can outperform larger models given the same computation budgets, and that smaller models paired with advanced inference algorithms yield Pareto-optimal cost-performance trade-offs. For instance, the Llemma-7B model, equipped with our novel tree search algorithm, consistently outperforms Llemma-34B with standard majority voting on the MATH benchmark across all FLOPs budgets. We hope these findings contribute to a broader understanding of inference scaling laws for LLMs.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Wu et al. "Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving." NeurIPS 2024 Workshops: MATH-AI, 2024.

Markdown

[Wu et al. "Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving." NeurIPS 2024 Workshops: MATH-AI, 2024.](https://mlanthology.org/neuripsw/2024/wu2024neuripsw-inference/)

BibTeX

@inproceedings{wu2024neuripsw-inference,
  title     = {{Inference Scaling Laws: An Empirical Analysis of Compute-Optimal Inference for LLM Problem-Solving}},
  author    = {Wu, Yangzhen and Sun, Zhiqing and Li, Shanda and Welleck, Sean and Yang, Yiming},
  booktitle = {NeurIPS 2024 Workshops: MATH-AI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/wu2024neuripsw-inference/}
}