A Missing Testbed for LLM Pre-Training Membership Inference Attacks
Abstract
We introduce a simple and rigorous testbed for membership inference attacks (MIA) against pre-training sequences for large language models (LLMs). Our testbed addresses the following gaps in existing evaluations, which lack: (1) \textit{uniform} sampling of member/non-member documents of varying lengths from pre-training shards; (2) large-scale \textit{deduplication} at varying strengths, both within and across the sampled members/non-members; and (3) rigorous \textit{statistical tests} to detect member/non-member distribution shifts that cause faulty evaluations and are otherwise imperceptible to the heuristic techniques used in prior work. We provide both global- and domain-level datasets (e.g., Reddit, Stack Exchange, Wikipedia), derived from fully-open pre-trained LLM/dataset pairs including Pythia/Pile, Olmo/Dolma, and our custom pre-trained GPT-2-Large on FineWeb-Edu. We additionally open source a modular and extensible codebase that facilitates the creation of custom, statistically validated, and deduplicated evaluation data using future open models and datasets. In sum, our work is a concrete step towards addressing the evaluation issues discussed by prior work.
Cite
Text
Jiang et al. "A Missing Testbed for LLM Pre-Training Membership Inference Attacks." ICLR 2025 Workshops: BuildingTrust, 2025.Markdown
[Jiang et al. "A Missing Testbed for LLM Pre-Training Membership Inference Attacks." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/jiang2025iclrw-missing/)BibTeX
@inproceedings{jiang2025iclrw-missing,
title = {{A Missing Testbed for LLM Pre-Training Membership Inference Attacks}},
author = {Jiang, Mingjian and Liu, Ken Ziyu and Koyejo, Sanmi},
booktitle = {ICLR 2025 Workshops: BuildingTrust},
year = {2025},
url = {https://mlanthology.org/iclrw/2025/jiang2025iclrw-missing/}
}