SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Röttger, Paul; Pernisi, Fabio; Vidgen, Bertie; Hovy, Dirk

doi:10.1609/AAAI.V39I26.34975

SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

Paul Röttger, Fabio Pernisi, Bertie Vidgen, Dirk Hovy

AAAI 2025 pp. 27617-27627

doi:10.1609/AAAI.V39I26.34975 /aaai/2025/rottger2025aaai-safetyprompts/

Abstract

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety. However, much of this work has happened in parallel, and with very different goals in mind, ranging from the mitigation of near-term risks around bias and toxic content generation to the assessment of longer-term catastrophic risk potential. This makes it difficult for researchers and practitioners to find the most relevant datasets for their use case, and to identify gaps in dataset coverage that future work may fill. To remedy these issues, we conduct a first systematic review of open datasets for evaluating and improving LLM safety. We review 144 datasets, which we identified through an iterative and community-driven process over the course of several months. We highlight patterns and trends, such as a trend towards fully synthetic datasets, as well as gaps in dataset coverage, such as a clear lack of non-English and naturalistic datasets. We also examine how LLM safety datasets are used in practice -- in LLM release publications and popular LLM benchmarks -- finding that current evaluation practices are highly idiosyncratic and make use of only a small fraction of available datasets. Our contributions are based on SafetyPrompts.com, a living catalogue of open datasets for LLM safety, which we plan to update continuously as the field of LLM safety develops.

PDF AAAI Semantic Scholar

Cite

Text

Röttger et al. "SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I26.34975

Markdown

[Röttger et al. "SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/rottger2025aaai-safetyprompts/) doi:10.1609/AAAI.V39I26.34975

BibTeX

@inproceedings{rottger2025aaai-safetyprompts,
  title     = {{SafetyPrompts: A Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety}},
  author    = {Röttger, Paul and Pernisi, Fabio and Vidgen, Bertie and Hovy, Dirk},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {27617-27627},
  doi       = {10.1609/AAAI.V39I26.34975},
  url       = {https://mlanthology.org/aaai/2025/rottger2025aaai-safetyprompts/}
}