Enhancing LLM Watermark Resilience Against Both Scrubbing and Spoofing Attacks

Abstract

Watermarking is widely regarded as a promising defense against the misuse of large language models (LLMs); however, existing methods are fundamentally constrained by their vulnerability to scrubbing and spoofing attacks. This vulnerability stems from an inherent trade-off governed by watermark window size: smaller windows resist scrubbing better but are easier to reverse-engineer, enabling low-cost statistics-based spoofing attacks. This work expands the trade-off boundary by introducing a novel mechanism, equivalent texture keys, where multiple tokens within a watermark window can independently support the detection. Based on the redundancy, we propose a watermark scheme with **S**ub-vocabulary decomposed **E**quivalent t**E**xture **K**ey (**SEEK**). SEEK achieves a Pareto improvement, enhancing robustness to scrubbing attacks without sacrificing resistance to spoofing.

Cite

Text

Shen et al. "Enhancing LLM Watermark Resilience Against Both Scrubbing and  Spoofing Attacks." Advances in Neural Information Processing Systems, 2025.

Markdown

[Shen et al. "Enhancing LLM Watermark Resilience Against Both Scrubbing and  Spoofing Attacks." Advances in Neural Information Processing Systems, 2025.](https://mlanthology.org/neurips/2025/shen2025neurips-enhancing/)

BibTeX

@inproceedings{shen2025neurips-enhancing,
  title     = {{Enhancing LLM Watermark Resilience Against Both Scrubbing and  Spoofing Attacks}},
  author    = {Shen, Huanming and Huang, Baizhou and Wan, Xiaojun},
  booktitle = {Advances in Neural Information Processing Systems},
  year      = {2025},
  url       = {https://mlanthology.org/neurips/2025/shen2025neurips-enhancing/}
}