Provable Robust Watermarking for AI-Generated Text

Abstract

As AI-generated text increasingly resembles human-written content, the ability to detect machine-generated text becomes crucial. To address this challenge, we present GPTWatermark, a robust and high-quality solution designed to ascertain whether a piece of text originates from a specific model. Our approach extends existing watermarking strategies and employs a fixed group design to enhance robustness against editing and paraphrasing attacks. We show that our watermarked language model enjoys strong provable guarantees on generation quality, correctness in detection, and security against evasion attacks. Experimental results on various large language models (LLMs) and diverse datasets demonstrate that our method achieves superior detection accuracy and comparable generation quality in perplexity, thus promoting the responsible use of LLMs.

Cite

Text

Zhao et al. "Provable Robust Watermarking for AI-Generated Text." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Zhao et al. "Provable Robust Watermarking for AI-Generated Text." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/zhao2023icmlw-provable/)

BibTeX

@inproceedings{zhao2023icmlw-provable,
  title     = {{Provable Robust Watermarking for AI-Generated Text}},
  author    = {Zhao, Xuandong and Ananth, Prabhanjan Vijendra and Li, Lei and Wang, Yu-Xiang},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/zhao2023icmlw-provable/}
}