AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training

Feng, Xidong; Wan, Ziyu; Wen, Muning; Wen, Ying; Zhang, Weinan; Wang, Jun

AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training

Xidong Feng, Ziyu Wan, Muning Wen, Ying Wen, Weinan Zhang, Jun Wang

NeurIPSW 2023

/neuripsw/2023/feng2023neuripsw-alphazerolike/

Abstract

Large language models (LLMs) typically employ sampling or beam search, accompanied by prompts such as Chain-of-Thought (CoT), to boost reasoning and decoding ability. Recent work like Tree-of-Thought (ToT) and Reasoning via Planning (RAP) aim to augment the reasoning capabilities of LLMs by utilizing tree-search algorithms to guide multi-step reasoning. These methods mainly focus on LLMs' reasoning ability during inference and heavily rely on human-designed prompts to activate LLM as a value function, thus lacking general applicability and scalability. To address these limitations, we present an AlphaZero-like tree-search learning framework for LLMs (termed TS-LLM), systematically showing how tree-search with a learned value function can guide LLMs' decoding ability. TS-LLM distinguishes itself in two key ways: (1) Leveraging a learned value function, our approach can be generally applied to different tasks beyond reasoning (such as RLHF alignment), and LLMs of any size, without prompting advanced, large-scale models. (2) It can guide LLM's decoding during both inference and training. Empirical evaluations across reasoning, planning, and RLHF alignment tasks validate the effectiveness of TS-LLM, even on trees with a depth of 64.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Feng et al. "AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training." NeurIPS 2023 Workshops: FMDM, 2023.

Markdown

[Feng et al. "AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training." NeurIPS 2023 Workshops: FMDM, 2023.](https://mlanthology.org/neuripsw/2023/feng2023neuripsw-alphazerolike/)

BibTeX

@inproceedings{feng2023neuripsw-alphazerolike,
  title     = {{AlphaZero-like Tree-Search Can Guide Large Language Model Decoding and Training}},
  author    = {Feng, Xidong and Wan, Ziyu and Wen, Muning and Wen, Ying and Zhang, Weinan and Wang, Jun},
  booktitle = {NeurIPS 2023 Workshops: FMDM},
  year      = {2023},
  url       = {https://mlanthology.org/neuripsw/2023/feng2023neuripsw-alphazerolike/}
}