REST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent

Abstract

Answering complex natural language questions often necessitates multi-step reasoning and integrating external information. Several systems have combined knowledge retrieval with a large language model (LLM) to answer such questions. These systems, however, suffer from various failure cases, and we cannot directly train them end-to-end to fix such failures, as interaction with external knowledge is non-differentiable. To address these deficiencies, we define a ReAct-style LLM agent with the ability to reason and act upon external knowledge. We further refine the agent through a ReST-like method that iteratively trains on previous trajectories, employing growing-batch reinforcement learning with AI feedback for continuous self-improvement and self-distillation. Starting from a prompted large model and after just two iterations of the algorithm, we can produce a fine-tuned small model that achieves comparable performance on challenging compositional question-answering benchmarks with two orders of magnitude fewer parameters.

Cite

Text

Aksitov et al. "REST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent." ICLR 2024 Workshops: LLMAgents, 2024.

Markdown

[Aksitov et al. "REST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent." ICLR 2024 Workshops: LLMAgents, 2024.](https://mlanthology.org/iclrw/2024/aksitov2024iclrw-rest/)

BibTeX

@inproceedings{aksitov2024iclrw-rest,
  title     = {{REST Meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent}},
  author    = {Aksitov, Renat and Miryoosefi, Sobhan and Li, Zonglin and Li, Daliang and Babayan, Sheila and Kopparapu, Kavya and Fisher, Zachary and Guo, Ruiqi and Prakash, Sushant and Srinivasan, Pranesh and Zaheer, Manzil and Yu, Felix and Kumar, Sanjiv},
  booktitle = {ICLR 2024 Workshops: LLMAgents},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/aksitov2024iclrw-rest/}
}