Evaluating LLM Memorization Using Soft Token Sparsity

Abstract

Large language models (LLMs) memorize portions of their training data, posing threats to privacy and copyright protection. Existing work proposes several definitions of memorization, often with the goal of practical testing. In this work, we investigate compressive memorization and address its key limitation--computational inefficiency. To this end, we propose the adversarial sparsity ratio (ASR) as a proxy for compressive memorization. The ASR identifies sparse soft prompts that elicit target sequences, enabling a more computationally tractable assessment of memorization. Empirically, we show that ASR effectively distinguishes between memorized and non-memorized content. Furthermore, beyond verbatim memorization, ASR also captures memorization of underlying knowledge, offering a scalable and interpretable tool for analyzing memorization in LLMs.

Cite

Text

Feng et al. "Evaluating LLM Memorization Using Soft Token Sparsity." ICLR 2025 Workshops: SLLM, 2025.

Markdown

[Feng et al. "Evaluating LLM Memorization Using Soft Token Sparsity." ICLR 2025 Workshops: SLLM, 2025.](https://mlanthology.org/iclrw/2025/feng2025iclrw-evaluating/)

BibTeX

@inproceedings{feng2025iclrw-evaluating,
  title     = {{Evaluating LLM Memorization Using Soft Token Sparsity}},
  author    = {Feng, Zhili and Xu, Yixuan Even and Maini, Pratyush and Robey, Alexander and Schwarzschild, Avi and Kolter, J Zico},
  booktitle = {ICLR 2025 Workshops: SLLM},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/feng2025iclrw-evaluating/}
}