Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training

Abstract

Retrieval-Augmented Generation (RAG) methods enhance LLM performance by efficiently filtering relevant context for LLMs, reducing hallucinations and inference cost. However, most existing RAG methods focus on single-step retrieval, which is often insufficient for answering complex questions that require multi-step search. Recently, multi-step retrieval approaches have emerged, typically involving the fine-tuning of small LLMs to perform multi-step retrieval. This type of fine-tuning is highly resource-intensive and does not enable the use of larger LLMs. In this work, we propose Q-RAG, a novel approach that fine-tunes the Embedder model for multi-step retrieval using reinforcement learning (RL). Q-RAG offers a competitive, resource-efficient alternative to existing multi-step retrieval methods for open-domain question answering and achieves state-of-the-art results on the popular long-context benchmarks BabiLong and RULER for contexts up to 10M tokens. Code is available at: https://github.com/griver/Q-RAG.

Cite

Text

Sorokin et al. "Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training." International Conference on Learning Representations, 2026.

Markdown

[Sorokin et al. "Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/sorokin2026iclr-qrag/)

BibTeX

@inproceedings{sorokin2026iclr-qrag,
  title     = {{Q-RAG: Long Context Multi‑Step Retrieval via Value‑Based Embedder Training}},
  author    = {Sorokin, Artyom and Buzun, Nazar and Anokhin, Aleksandr and Vedernikov, Egor KONSTANTINOVICH and Anokhin, Petr and Burtsev, Mikhail and Burnaev, Evgeny},
  booktitle = {International Conference on Learning Representations},
  year      = {2026},
  url       = {https://mlanthology.org/iclr/2026/sorokin2026iclr-qrag/}
}