ContextCite: Attributing Model Generation to Context

Abstract

How do language models use information provided as context when generating a response?Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated?To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement.We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model.Finally, we showcase the utility of ContextCite through three applications:(1) helping verify generated statements(2) improving response quality by pruning the context and(3) detecting poisoning attacks.We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Cite

Text

Cohen-Wang et al. "ContextCite: Attributing Model Generation to Context." Neural Information Processing Systems, 2024. doi:10.52202/079017-3035

Markdown

[Cohen-Wang et al. "ContextCite: Attributing Model Generation to Context." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/cohenwang2024neurips-contextcite/) doi:10.52202/079017-3035

BibTeX

@inproceedings{cohenwang2024neurips-contextcite,
  title     = {{ContextCite: Attributing Model Generation to Context}},
  author    = {Cohen-Wang, Benjamin and Shah, Harshay and Georgiev, Kristian and Mądry, Aleksander},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3035},
  url       = {https://mlanthology.org/neurips/2024/cohenwang2024neurips-contextcite/}
}