Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs

Abstract

While search-augmented large language models (LLMs) exhibit impressive capabilities, their reliability in complex multi-hop reasoning remains limited. This limitation arises from three fundamental challenges: decomposition errors, where tasks are incorrectly broken down; retrieval missing, where key evidence fails to be retrieved; and reasoning errors, where flawed logic propagates through the reasoning chain. A single failure in any of these stages can derail the final answer. We propose Erasable Reinforcement Learning (ERL), a novel framework that transforms fragile reasoning into a robust process. ERL explicitly identifies faulty steps, erases them, and regenerates reasoning in place. This targeted correction mechanism turns brittle reasoning into a more resilient process. Models trained with ERL, termed ESearch, achieve substantial improvements on HotpotQA, MuSiQue, 2Wiki, and Bamboogle, with the 3B model achieving +8.48% EM and +11.56% F1, and the 7B model achieving +5.38% EM and +7.22% F1 over previous state-of-the-art(SOTA) results. These findings suggest that erasable reinforcement learning provides a powerful paradigm shift for robust multi-step reasoning in LLMs.

Cite

Text

An et al. "Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs." International Conference on Learning Representations, 2026.

Markdown

[An et al. "Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/an2026iclr-erase/)

BibTeX

@inproceedings{an2026iclr-erase,
  title     = {{Erase to Improve: Erasable Reinforcement Learning for Search-Augmented LLMs}},
  author    = {An, Kang and Wang, Ziliang and Zheng, Xuhui and Qian, FaQiang and Zhang, WeiKun and Wang, Yuhang and Wu, Yichao},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/an2026iclr-erase/}
}