RANa: Retrieval-Augmented Navigation

Abstract

Methods for navigation based on large-scale learning typically treat each episode as a new problem, where the agent is spawned with a clean memory in an unknown environment. While these generalization capabilities to an unknown environment are extremely important, we claim that, in a realistic setting, an agent should have the capacity of exploiting information collected during earlier robot operations. We address this by introducing a new retrieval-augmented agent, trained with RL, capable of querying a database collected from previous episodes in the same environment and learning how to integrate this additional context information. We introduce a unique agent architecture for the general navigation task, evaluated on ImageNav, Instance-ImageNav and ObjectNav. Our retrieval and context encoding methods are data-driven and employ vision foundation models (FM) for both semantic and geometric understanding. We propose new benchmarks for these settings and we show that retrieval allows zero-shot transfer across tasks and environments while significantly improving performance.

Cite

Text

Monaci et al. "RANa: Retrieval-Augmented Navigation." Transactions on Machine Learning Research, 2025.

Markdown

[Monaci et al. "RANa: Retrieval-Augmented Navigation." Transactions on Machine Learning Research, 2025.](https://mlanthology.org/tmlr/2025/monaci2025tmlr-rana/)

BibTeX

@article{monaci2025tmlr-rana,
  title     = {{RANa: Retrieval-Augmented Navigation}},
  author    = {Monaci, Gianluca and Rezende, Rafael S. and Deffayet, Romain and Csurka, Gabriela and Bono, Guillaume and Déjean, Hervé and Clinchant, Stéphane and Wolf, Christian},
  journal   = {Transactions on Machine Learning Research},
  year      = {2025},
  url       = {https://mlanthology.org/tmlr/2025/monaci2025tmlr-rana/}
}