AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards
Abstract
Intelligent system audits are labor-intensive assurance activities that are typically performed once and discarded along with the opportunity to programmatically test all similar products for the market. This study illustrates how several incidents (i.e., harms) involving Named Entity Recognition (NER) can be prevented by scaling up a previously-performed audit of NER systems. The audit instrument's diagnostic capacity is maintained through a security model that protects the underlying data (i.e., addresses Goodhart's Law). An open-source evaluation infrastructure is released along with an example derived from a real-world audit that reports aggregated findings without exposing the underlying data.
Cite
Text
Chadda et al. "AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I21.30346Markdown
[Chadda et al. "AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/chadda2024aaai-ai/) doi:10.1609/AAAI.V38I21.30346BibTeX
@inproceedings{chadda2024aaai-ai,
title = {{AI Evaluation Authorities: A Case Study Mapping Model Audits to Persistent Standards}},
author = {Chadda, Arihant and McGregor, Sean and Hostetler, Jesse and Brennen, Andrea},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2024},
pages = {23035-23040},
doi = {10.1609/AAAI.V38I21.30346},
url = {https://mlanthology.org/aaai/2024/chadda2024aaai-ai/}
}