Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems

Abstract

Retrieval-Augmented Generation (RAG) improves Language Models (LMs) by incorporating external knowledge at test time to enable customized adaptation. We study the risk of datastore leakage in Retrieval-In-Context based RAG systems. We show that an adversary can exploit LMs' instruction-following capabilities to easily extract text data verbatim from the datastore of RAG systems built with instruction-tuned LMs via prompt injection. The vulnerability exists for a wide range of modern LMs that span Llama2, Mistral/Mixtral, Vicuna, SOLAR, WizardLM, Qwen1.5, and Platypus2, and the exploitability exacerbates as the model size scales up. Extending our study to production models GPTs, we design an attack that can cause datastore leakage with a 100\% success rate on 25 randomly selected customized GPTs within at most 2 queries, and we show that with only 100 questions generated by GPT-4, one can attack GPTs to extract 36\% text data verbatim from a book of 77,000 words.

Cite

Text

Qi et al. "Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems." ICLR 2024 Workshops: DPFM, 2024.

Markdown

[Qi et al. "Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems." ICLR 2024 Workshops: DPFM, 2024.](https://mlanthology.org/iclrw/2024/qi2024iclrw-follow/)

BibTeX

@inproceedings{qi2024iclrw-follow,
  title     = {{Follow My Instruction and Spill the Beans: Scalable Data Extraction from Retrieval-Augmented Generation Systems}},
  author    = {Qi, Zhenting and Zhang, Hanlin and Xing, Eric P. and Kakade, Sham M. and Lakkaraju, Himabindu},
  booktitle = {ICLR 2024 Workshops: DPFM},
  year      = {2024},
  url       = {https://mlanthology.org/iclrw/2024/qi2024iclrw-follow/}
}