Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization

Jiaqi Wei, Hao Zhou, Xiang Zhang, Di Zhang, Zijie Qiu, Noah Wei, Jinzhe Li, Wanli Ouyang, Siqi Sun

NeurIPS 2025

/neurips/2025/wei2025neurips-retrieval/

Abstract

Retrieval-augmented generation (RAG) has become a widely adopted paradigm for enabling knowledge-grounded large language models (LLMs). However, standard RAG pipelines often fail to ensure that model reasoning remains consistent with the evidence retrieved, leading to factual inconsistencies or unsupported conclusions. In this work, we reinterpret RAG as \textit{Retrieval-Augmented Reasoning} and identify a central but underexplored problem: \textit{Reasoning Misalignment}—the divergence between an LLM's internal reasoning trajectory and the evidential constraints provided by retrieval. To address this issue, we propose \textsc{AlignRAG}, a novel iterative framework grounded in \textit{Critique-Driven Alignment (CDA)}. We further introduce \textsc{AlignRAG-auto}, an autonomous variant that dynamically terminates refinement, removing the need to pre-specify the number of critique iterations. At the heart of \textsc{AlignRAG} lies a \textit{contrastive critique synthesis} mechanism that generates retrieval-sensitive critiques while mitigating self-bias. This mechanism trains a dedicated retrieval-augmented \textit{Critic Language Model (CLM)} using labeled critiques that distinguish between evidence-aligned and misaligned reasoning. Empirical evaluations show that our approach significantly improves reasoning fidelity. Our 8B-parameter CLM improves performance over the Self-Refine baseline by \textbf{12.1\%} on out-of-domain tasks and outperforms a standard 72B-parameter CLM by \textbf{2.2\%}. Furthermore, \textsc{AlignRAG-auto} achieves this state-of-the-art performance while dynamically determining the optimal number of refinement steps, enhancing efficiency and usability. \textsc{AlignRAG} remains compatible with existing RAG architectures as a \textit{plug-and-play} module and demonstrates strong robustness under both informative and noisy retrieval scenarios. Overall, \textsc{AlignRAG} offers a principled solution for aligning model reasoning with retrieved evidence, substantially improving the factual reliability and robustness of RAG systems. Our source code is provided at \href{https://github.com/upup-wei/RAG-ReasonAlignment}link.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Wei et al. "Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization." Advances in Neural Information Processing Systems, 2025.

Markdown

[Wei et al. "Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/wei2025neurips-retrieval/)

BibTeX

@inproceedings{wei2025neurips-retrieval,
  title     = {{Retrieval Is Not Enough: Enhancing RAG Through Test-Time Critique and Optimization}},
  author    = {Wei, Jiaqi and Zhou, Hao and Zhang, Xiang and Zhang, Di and Qiu, Zijie and Wei, Noah and Li, Jinzhe and Ouyang, Wanli and Sun, Siqi},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/wei2025neurips-retrieval/}
}