DE-COP: Detecting Copyrighted Content in Language Models Training Data

Abstract

How can we detect if copyrighted content was used in the training process of a language model, considering that the training data is typically undisclosed? We are motivated by the premise that a language model is likely to identify verbatim excerpts from its training text. We propose DE-COP, a method to determine whether a piece of copyrighted content is included in training. DE-COP’s core approach is to probe an LLM with multiple-choice questions, whose options include both verbatim text and their paraphrases. We construct BookTection, a benchmark with excerpts from 165 books published prior and subsequent to a model’s training cutoff, along with their paraphrases. Our experiments show that DE-COP outperforms the prior best method by 8.6% in detection accuracy (AUC) on models with logits available. Moreover, DE-COP also achieves an average accuracy of 72% for detecting suspect books on fully black-box models where prior methods give approximately 0% accuracy. The code and datasets are available at https://github.com/LeiLiLab/DE-COP.

Cite

Text

Duarte et al. "DE-COP: Detecting Copyrighted Content in Language Models Training Data." International Conference on Machine Learning, 2024.

Markdown

[Duarte et al. "DE-COP: Detecting Copyrighted Content in Language Models Training Data." International Conference on Machine Learning, 2024.](https://mlanthology.org/icml/2024/duarte2024icml-decop/)

BibTeX

@inproceedings{duarte2024icml-decop,
  title     = {{DE-COP: Detecting Copyrighted Content in Language Models Training Data}},
  author    = {Duarte, André Vicente and Zhao, Xuandong and Oliveira, Arlindo L. and Li, Lei},
  booktitle = {International Conference on Machine Learning},
  year      = {2024},
  pages     = {11940-11956},
  volume    = {235},
  url       = {https://mlanthology.org/icml/2024/duarte2024icml-decop/}
}