IncDSI: Incrementally Updatable Document Retrieval

Abstract

Differentiable Search Index is a recently proposed paradigm for document retrieval, that encodes information about a corpus of documents within the parameters of a neural network and directly maps queries to corresponding documents. These models have achieved state-of-the-art performances for document retrieval across many benchmarks. These kinds of models have a significant limitation: it is not easy to add new documents after a model is trained. We propose IncDSI, a method to add documents in real time (about 20-50ms per document), without retraining the model on the entire dataset (or even parts thereof). Instead we formulate the addition of documents as a constrained optimization problem that makes minimal changes to the network parameters. Although orders of magnitude faster, our approach is competitive with re-training the model on the whole dataset and enables the development of document retrieval systems that can be updated with new information in real-time. Our code for IncDSI is available at https://github.com/varshakishore/IncDSI.

Cite

Text

Kishore et al. "IncDSI: Incrementally Updatable Document Retrieval." International Conference on Machine Learning, 2023.

Markdown

[Kishore et al. "IncDSI: Incrementally Updatable Document Retrieval." International Conference on Machine Learning, 2023.](https://mlanthology.org/icml/2023/kishore2023icml-incdsi/)

BibTeX

@inproceedings{kishore2023icml-incdsi,
  title     = {{IncDSI: Incrementally Updatable Document Retrieval}},
  author    = {Kishore, Varsha and Wan, Chao and Lovelace, Justin and Artzi, Yoav and Weinberger, Kilian Q},
  booktitle = {International Conference on Machine Learning},
  year      = {2023},
  pages     = {17122-17134},
  volume    = {202},
  url       = {https://mlanthology.org/icml/2023/kishore2023icml-incdsi/}
}