Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems
Abstract
Retrieval-Augmented Generation (RAG) systems enhance large language models (LLMs) by incorporating external knowledge bases, but this may expose them to extraction attacks, leading to potential copyright and privacy risks. However, existing extraction methods typically rely on malicious inputs such as prompt injection or jailbreaking, making them easily detectable via input- or output-level detection. In this paper, we introduce **I**mplicit **K**nowledge **E**xtraction **A**ttack (**IKEA**), which conducts *Knowledge Extraction* on RAG systems through benign queries. Specifically, **IKEA** first leverages anchor concepts—keywords related to internal knowledge—to generate queries with a natural appearance, and then designs two mechanisms that lead anchor concepts to thoroughly "explore" the RAG's knowledge: (1) Experience Reflection Sampling, which samples anchor concepts based on past query-response histories, ensuring their relevance to the topic; (2) Trust Region Directed Mutation, which iteratively mutates anchor concepts under similarity constraints to further exploit the embedding space. Extensive experiments demonstrate **IKEA**'s effectiveness under various defenses, surpassing baselines by over 80% in extraction efficiency and 90\% in attack success rate. Moreover, the substitute RAG system built from **IKEA**'s extractions shows close performance to the original RAG and outperforms those based on baselines across multiple evaluation tasks, underscoring the stealthy copyright infringement risk in RAG systems.
Cite
Text
Wang et al. "Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems." International Conference on Learning Representations, 2026.Markdown
[Wang et al. "Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems." International Conference on Learning Representations, 2026.](https://mlanthology.org/iclr/2026/wang2026iclr-silent/)BibTeX
@inproceedings{wang2026iclr-silent,
title = {{Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems}},
author = {Wang, Yuhao and Qu, Wenjie and Zhai, Shengfang and Jiang, Yanze and Zichen, Liu and Liu, Yue and Dong, Yinpeng and Zhang, Jiaheng},
booktitle = {International Conference on Learning Representations},
year = {2026},
url = {https://mlanthology.org/iclr/2026/wang2026iclr-silent/}
}