A Missing Testbed for LLM Pre-Training Membership Inference Attacks

Abstract

We introduce a simple and rigorous testbed for membership inference attacks (MIA) against pre-training sequences for large language models (LLMs). Our testbed addresses the following gaps in existing evaluations, which lack: (1) \textit{uniform} sampling of member/non-member documents of varying lengths from pre-training shards; (2) large-scale \textit{deduplication} at varying strengths, both within and across the sampled members/non-members; and (3) rigorous \textit{statistical tests} to detect member/non-member distribution shifts that cause faulty evaluations and are otherwise imperceptible to the heuristic techniques used in prior work. We provide both global- and domain-level datasets (e.g., Reddit, Stack Exchange, Wikipedia), derived from fully-open pre-trained LLM/dataset pairs including Pythia/Pile, Olmo/Dolma, and our custom pre-trained GPT-2-Large on FineWeb-Edu. We additionally open source a modular and extensible codebase that facilitates the creation of custom, statistically validated, and deduplicated evaluation data using future open models and datasets. In sum, our work is a concrete step towards addressing the evaluation issues discussed by prior work.

Cite

Text

Jiang et al. "A Missing Testbed for LLM Pre-Training Membership Inference Attacks." ICLR 2025 Workshops: BuildingTrust, 2025.

Markdown

[Jiang et al. "A Missing Testbed for LLM Pre-Training Membership Inference Attacks." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/jiang2025iclrw-missing/)

BibTeX

@inproceedings{jiang2025iclrw-missing,
  title     = {{A Missing Testbed for LLM Pre-Training Membership Inference Attacks}},
  author    = {Jiang, Mingjian and Liu, Ken Ziyu and Koyejo, Sanmi},
  booktitle = {ICLR 2025 Workshops: BuildingTrust},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/jiang2025iclrw-missing/}
}