VerMCTS: Synthesizing Multi-Step Programs Using a Verifier, a Large Language Model, and Tree Search

Abstract

Large Language Models (LLMs) can generate useful code, but often the code they generate cannot be trusted to be sound. In this paper, we present VerMCTS, an approach to begin to resolve this issue by generating verified programs in Dafny and Coq. VerMCTS uses a logical verifier in concert with an LLM to guide a modified Monte Carlo Tree Search (MCTS). This approach leverages the verifier to gain intermediate feedback inside the search algorithm by checking partial programs at each step to estimate an upper bound on the value function.To measure the performance of VerMCTS, we develop a new suite of multi-step verified programming problems in Dafny and Coq. In terms of pass@$T$, a new metric which computes the pass rate given a budget of $T$ tokens sampled from the LLM, VerMCTS leads to more than a 30\% absolute increase in average pass@5000 across the suite over repeated sampling from the base language model.

Cite

Text

Brandfonbrener et al. "VerMCTS: Synthesizing Multi-Step Programs Using a Verifier, a Large Language Model, and Tree Search." NeurIPS 2024 Workshops: MATH-AI, 2024.

Markdown

[Brandfonbrener et al. "VerMCTS: Synthesizing Multi-Step Programs Using a Verifier, a Large Language Model, and Tree Search." NeurIPS 2024 Workshops: MATH-AI, 2024.](https://mlanthology.org/neuripsw/2024/brandfonbrener2024neuripsw-vermcts/)

BibTeX

@inproceedings{brandfonbrener2024neuripsw-vermcts,
  title     = {{VerMCTS: Synthesizing Multi-Step Programs Using a Verifier, a Large Language Model, and Tree Search}},
  author    = {Brandfonbrener, David and Henniger, Simon and Raja, Sibi and Prasad, Tarun and Loughridge, Chloe R and Cassano, Federico and Hu, Sabrina Ruixin and Yang, Jianang and Byrd, William E. and Zinkov, Robert and Amin, Nada},
  booktitle = {NeurIPS 2024 Workshops: MATH-AI},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/brandfonbrener2024neuripsw-vermcts/}
}