Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs
Abstract
While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.
Cite
Text
An et al. "Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs." International Conference on Learning Representations, 2026.Markdown
[An et al. "Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/an2026iclr-erase/)BibTeX
@inproceedings{an2026iclr-erase,
title = {{Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs}},
author = {An, Kang and Wang, Ziliang and Zheng, Xuhui and Qian, FaQiang and Zhang, WeiKun and Wang, Yuhang and Wu, Yichao},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/an2026iclr-erase/}
}