AI Red Teaming Through the Lens of Measurement Theory

Abstract

AI Red teaming--i.e., simulating attacks on computer systems to identify vulnerabilities and improve defenses--can yield both qualitative and quantitative information about generative AI (GenAI) system behaviors to inform system evaluations. This is a very broad mandate, which has led to critiques that red teaming is both everything and nothing. We believe there is a more fundamental problem:various forms of red teaming are more commonly being used to produce quantitative information that is used to compare GenAI systems. This raises the question: (When) can the types of quantitative information that red-teaming activities produce actually be used to make meaningful comparisons of systems? To answer this question, we draw on ideas from measurement theory as developed in the quantitative social sciences, which offers a conceptual framework for understanding the conditions under which the numerical values resulting from a quantification of the properties of a system can be meaningfully compared. Through this lens, we explain why red-teaming attack success rate (ASR) metrics generally should not be compared across time, settings, or systems.

Cite

Text

Chouldechova et al. "AI Red Teaming Through the Lens of Measurement Theory." NeurIPS 2024 Workshops: SafeGenAi, 2024.

Markdown

[Chouldechova et al. "AI Red Teaming Through the Lens of Measurement Theory." NeurIPS 2024 Workshops: SafeGenAi, 2024.](https://mlanthology.org/neuripsw/2024/chouldechova2024neuripsw-ai/)

BibTeX

@inproceedings{chouldechova2024neuripsw-ai,
  title     = {{AI Red Teaming Through the Lens of Measurement Theory}},
  author    = {Chouldechova, Alexandra and Cooper, A. Feder and Palia, Abhinav and Vann, Dan and Atalla, Chad and Washington, Hannah and Sheng, Emily and Wallach, Hanna},
  booktitle = {NeurIPS 2024 Workshops: SafeGenAi},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chouldechova2024neuripsw-ai/}
}