Decoding Safety Feedback from Diverse Raters: A Data-Driven Lens on Responsiveness to Severity
Abstract
Ensuring the safety of Generative AI requires a nuanced understanding of pluralistic viewpoints. In this paper, we introduce a novel data-driven approach for analyzing ordinal safety ratings in pluralistic settings. Specifically, we address the challenge of interpreting nuanced differences in safety feedback from a diverse population expressed via ordinal scales (e.g., a Likert scale). We define non-parametric responsiveness metrics that quantify how raters convey broader distinctions and granular variations in the severity of safety violations. Leveraging publicly available datasets of pluralistic safety feedback as our case studies, we investigate how raters from different demographic groups use an ordinal scale to express their perceptions of the severity of violations. We apply our metrics across violation types, demonstrating their utility in extracting nuanced insights that are crucial for aligning AI systems reliably in multi-cultural contexts. We show that our approach can inform rater selection and feedback interpretation by capturing nuanced viewpoints across different demographic groups, hence improving the quality of pluralistic data collection and in turn contributing to more robust AI alignment.
Cite
Text
Mishra et al. "Decoding Safety Feedback from Diverse Raters: A Data-Driven Lens on Responsiveness to Severity." Transactions on Machine Learning Research, 2026.Markdown
[Mishra et al. "Decoding Safety Feedback from Diverse Raters: A Data-Driven Lens on Responsiveness to Severity." Transactions on Machine Learning Research, 2026.](https://mlanthology.org/tmlr/2026/mishra2026tmlr-decoding/)BibTeX
@article{mishra2026tmlr-decoding,
title = {{Decoding Safety Feedback from Diverse Raters: A Data-Driven Lens on Responsiveness to Severity}},
author = {Mishra, Pushkar and Rastogi, Charvi and Pfohl, Stephen R and Parrish, Alicia and Teh, Tian Huey and Patel, Roma and Diaz, Mark and Wang, Ding and Paganini, Michela and Prabhakaran, Vinodkumar and Aroyo, Lora and Rieser, Verena},
journal = {Transactions on Machine Learning Research},
year = {2026},
url = {https://mlanthology.org/tmlr/2026/mishra2026tmlr-decoding/}
}