MIA-Tuner: Adapting Large Language Models as Pre-Training Text Detector

Abstract

The increasing parameters and expansive dataset of large lan- guage models (LLMs) highlight the urgent demand for a technical solution to audit the underlying privacy risks and copyright issues associated with LLMs. Existing studies have partially addressed this need through an exploration of the pre-training data detection problem, which is an instance of a membership inference attack (MIA). This problem involves determining whether a given piece of text has been used during the pre-training phase of the target LLM. Although existing methods have designed various sophisticated MIA score functions to achieve considerable detection performance in pre-trained LLMs, how to achieve high-confidence detection and how to perform MIA on aligned LLMs remain challenging. In this paper, we propose MIA-Tuner, a novel instruction-based MIA method, which instructs LLMs themselves to serve as a more precise pre-training data detector internally, rather than design an external MIA score function. Furthermore, we design two instruction-based safeguards to respectively mitigate the privacy risks brought by the existing methods and MIA-Tuner. To comprehensively evaluate the most recent state-of-the-art LLMs, we collect a more up-to-date MIA benchmark dataset, named WIKIMIA-24, to replace the widely adopted benchmark WIKIMIA. We conduct extensive experiments across various aligned and unaligned LLMs over the two benchmark datasets. The results demonstrate that MIA-Tuner increases the AUC of MIAs from 0.7 to a significantly high level of 0.9.

Cite

Text

Fu et al. "MIA-Tuner: Adapting Large Language Models as Pre-Training Text Detector." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I26.34939

Markdown

[Fu et al. "MIA-Tuner: Adapting Large Language Models as Pre-Training Text Detector." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/fu2025aaai-mia/) doi:10.1609/AAAI.V39I26.34939

BibTeX

@inproceedings{fu2025aaai-mia,
  title     = {{MIA-Tuner: Adapting Large Language Models as Pre-Training Text Detector}},
  author    = {Fu, Wenjie and Wang, Huandong and Gao, Chen and Liu, Guanghua and Li, Yong and Jiang, Tao},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {27295-27303},
  doi       = {10.1609/AAAI.V39I26.34939},
  url       = {https://mlanthology.org/aaai/2025/fu2025aaai-mia/}
}