Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

Wallach, Hanna; Desai, Meera; Cooper, A. Feder; Wang, Angelina; Atalla, Chad; Barocas, Solon; Blodgett, Su Lin; Chouldechova, Alexandra; Corvi, Emily; Dow, P. Alex; Garcia-Gathright, Jean; Olteanu, Alexandra; Pangakis, Nicholas J; Reed, Stefanie; Sheng, Emily; Vann, Dan; Vaughan, Jennifer Wortman; Vogel, Matthew; Washington, Hannah; Jacobs, Abigail Z.

Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge

ICML 2025 pp. 82232-82251

/icml/2025/wallach2025icml-position/

Abstract

The measurement tasks involved in evaluating generative AI (GenAI) systems lack sufficient scientific rigor, leading to what has been described as "a tangle of sloppy tests [and] apples-to-oranges comparisons" (Roose, 2024). In this position paper, we argue that the ML community would benefit from learning from and drawing on the social sciences when developing and using measurement instruments for evaluating GenAI systems. Specifically, our position is that evaluating GenAI systems is a social science measurement challenge. We present a four-level framework, grounded in measurement theory from the social sciences, for measuring concepts related to the capabilities, behaviors, and impacts of GenAI systems. This framework has two important implications: First, it can broaden the expertise involved in evaluating GenAI systems by enabling stakeholders with different perspectives to participate in conceptual debates. Second, it brings rigor to both conceptual and operational debates by offering a set of lenses for interrogating validity.

PDF ICML OpenReview Semantic Scholar

Cite

Text

Wallach et al. "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge." Proceedings of the 42nd International Conference on Machine Learning, 2025.

Markdown

[Wallach et al. "Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge." Proceedings of the 42nd International Conference on Machine Learning, 2025.](https://mlanthology.org/icml/2025/wallach2025icml-position/)

BibTeX

@inproceedings{wallach2025icml-position,
  title     = {{Position: Evaluating Generative AI Systems Is a Social Science Measurement Challenge}},
  author    = {Wallach, Hanna and Desai, Meera and Cooper, A. Feder and Wang, Angelina and Atalla, Chad and Barocas, Solon and Blodgett, Su Lin and Chouldechova, Alexandra and Corvi, Emily and Dow, P. Alex and Garcia-Gathright, Jean and Olteanu, Alexandra and Pangakis, Nicholas J and Reed, Stefanie and Sheng, Emily and Vann, Dan and Vaughan, Jennifer Wortman and Vogel, Matthew and Washington, Hannah and Jacobs, Abigail Z.},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning},
  year      = {2025},
  pages     = {82232-82251},
  volume    = {267},
  url       = {https://mlanthology.org/icml/2025/wallach2025icml-position/}
}