AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Abstract

Search-augmented LLMs often struggle with complex reasoning tasks due to ineffective multi-hop retrieval and limited reasoning ability. We propose AceSearcher, a cooperative self-play framework that trains a single large language model (LLM) to alternate between two roles: a decomposer that breaks down complex queries and a solver that integrates retrieved contexts for answer generation. AceSearcher couples supervised fine-tuning on a diverse mixture of search, reasoning, and decomposition tasks with reinforcement fine-tuning optimized for final answer accuracy, eliminating the need for intermediate annotations. Extensive experiments on three reasoning-intensive tasks across 10 datasets show that AceSearcher outperforms state-of-the-art baselines, achieving an average exact match improvement of 7.6%. Remarkably, on document-level finance reasoning tasks, AceSearcher-32B matches the performance of the giant DeepSeek-V3 model using less than 5% of iits parameters. Even at smaller scales (1.5B and 8B), AceSearcher often surpasses existing search-augmented LLMs with up to 9× more parameters, highlighting its exceptional efficiency and effectiveness in tackling complex reasoning tasks.

Cite

Text

Xu et al. "AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play." Advances in Neural Information Processing Systems, 2025.

Markdown

[Xu et al. "AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/xu2025neurips-acesearcher/)

BibTeX

@inproceedings{xu2025neurips-acesearcher,
  title     = {{AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play}},
  author    = {Xu, Ran and Zhuang, Yuchen and Dong, Zihan and Wang, Ruiyu and Yu, Yue and Ho, Joyce C. and Zhang, Linjun and Wang, Haoyu and Shi, Wenqi and Yang, Carl},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/xu2025neurips-acesearcher/}
}