PromptDSI: Prompt-Based Rehearsal-Free Continual Learning for Document Retrieval

Abstract

Differentiable Search Index (DSI) utilizes pre-trained language models to perform indexing and document retrieval via end-to-end learning without relying on external indexes. However, DSI requires full re-training to index new documents, causing significant computational inefficiencies. Continual learning (CL) offers a solution by enabling the model to incrementally update without full re-training. Existing CL solutions in document retrieval rely on memory buffers or generative models for rehearsal, which is infeasible when accessing previous training data is restricted due to privacy concerns. To this end, we introduce PromptDSI, a prompt-based, rehearsal-free continual learning approach for document retrieval. PromptDSI follows the Prompt-based Continual Learning (PCL) framework, using learnable prompts to efficiently index new documents without accessing previous documents or queries. To improve retrieval latency, we remove the initial forward pass of PCL, which otherwise greatly increases training and inference time, with a negligible trade-off in performance. Additionally, we introduce a novel topic-aware prompt pool that employs neural topic embeddings as fixed keys, eliminating the instability of prompt key optimization while maintaining competitive performance with existing PCL prompt pools. In a challenging rehearsal-free continual learning setup, we demonstrate that PromptDSI variants outperform rehearsal-based baselines, match the strong cache-based baseline in mitigating forgetting, and significantly improving retrieval performance on new corpora.

Cite

Text

Huynh et al. "PromptDSI: Prompt-Based Rehearsal-Free Continual Learning for Document Retrieval." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06109-6_22

Markdown

[Huynh et al. "PromptDSI: Prompt-Based Rehearsal-Free Continual Learning for Document Retrieval." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/huynh2025ecmlpkdd-promptdsi/) doi:10.1007/978-3-032-06109-6_22

BibTeX

@inproceedings{huynh2025ecmlpkdd-promptdsi,
  title     = {{PromptDSI: Prompt-Based Rehearsal-Free Continual Learning for Document Retrieval}},
  author    = {Huynh, Tuan-Luc and Vu, Thuy-Trang and Wang, Weiqing and Wei, Yinwei and Le, Trung and Gasevic, Dragan and Li, Yuan-Fang and Do, Thanh-Toan},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {383-401},
  doi       = {10.1007/978-3-032-06109-6_22},
  url       = {https://mlanthology.org/ecmlpkdd/2025/huynh2025ecmlpkdd-promptdsi/}
}