FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents

Nandan Thakur, Jimmy Lin, Sam Havens, Michael Carbin, Omar Khattab, Andrew Drozdov

NeurIPS 2025

/neurips/2025/thakur2025neurips-freshstack/

Abstract

We introduce FreshStack, a holistic framework for automatically building information retrieval (IR) evaluation benchmarks by incorporating challenging questions and answers. FreshStack conducts the following steps: (1) automatic corpus collection from code and technical documentation, (2) nugget generation from community-asked questions and answers, and (3) nugget-level support, retrieving documents using a fusion of retrieval techniques and hybrid architectures. We use FreshStack to build five datasets on fast-growing, recent, and niche domains to ensure the tasks are sufficiently challenging. On FreshStack, existing retrieval models, when applied out-of-the-box, significantly underperform oracle approaches on all five domains, denoting plenty of headroom to improve IR quality. In addition, we identify cases where rerankers do not improve first-stage retrieval accuracy (two out of five domains) and oracle context helps an LLM generator generate a high-quality RAG answer. We hope FreshStack will facilitate future work toward constructing realistic, scalable, and uncontaminated IR and RAG evaluation benchmarks.

PDF NeurIPS OpenReview Semantic Scholar

Cite

Text

Thakur et al. "FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents." Advances in Neural Information Processing Systems, 2025.

Markdown

[Thakur et al. "FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/thakur2025neurips-freshstack/)

BibTeX

@inproceedings{thakur2025neurips-freshstack,
  title     = {{FreshStack: Building Realistic Benchmarks for Evaluating Retrieval on Technical Documents}},
  author    = {Thakur, Nandan and Lin, Jimmy and Havens, Sam and Carbin, Michael and Khattab, Omar and Drozdov, Andrew},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/thakur2025neurips-freshstack/}
}