Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search
Abstract
In this work, we propose ***R**einforced **F**unctional **T**oken **T**uning* (RFTT), a novel reinforced fine-tuning framework that empowers Large Language Models (LLMs) with learn-to-reason capabilities. Unlike prior prompt-driven reasoning efforts, RFTT embeds a rich set of learnable functional tokens (*e.g.*, \<analyze\>, \<verify\>, \<refine\>) directly into the model vocabulary, enabling chain-of-thought construction with diverse human-like reasoning behaviors. Specifically, RFTT comprises two phases: (1) supervised fine-tuning performs prompt-driven tree search to obtain self-generated training data annotated with functional tokens, which warms up the model to learn these tokens for initial reasoning capability; and (2) online reinforcement learning further allows the model to explore diverse reasoning pathways through functional token sampling without relying on prompts, thereby facilitating effective self-improvement for functional reasoning. Extensive experiments demonstrate the superiority of the proposed RFTT on mathematical benchmarks and highlight its strong generalization capability to other general domains. Moreover, the performance of RFTT exhibits consistent gains with increased test-time computation through additional search rollouts. Our code and dataset are available at https://github.com/sastpg/RFTT.
Cite
Text
Zhang et al. "Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search." International Conference on Learning Representations, 2026.Markdown
[Zhang et al. "Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/zhang2026iclr-incentivizing/)BibTeX
@inproceedings{zhang2026iclr-incentivizing,
title = {{Incentivizing LLM Reasoning via Reinforcement Learning with Functional Monte Carlo Tree Search}},
author = {Zhang, Kongcheng and Yao, Qi and Lai, Baisheng and Huang, Jiaxing and Fang, Wenkai and Tao, Dacheng and Song, Mingli and Liu, Shunyu},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/zhang2026iclr-incentivizing/}
}