RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

de Wynter, Adrian; Watts, Ishaan; Wongsangaroonsri, Tua; Zhang, Minghui; Farra, Noura; Altintoprak, Nektar Ege; Baur, Lena; Claudet, Samantha; Gajdusek, Pavel; Gu, Qilong; Kaminska, Anna; Kaminski, Tomasz; Kuo, Ruby; Kyuba, Akiko; Lee, Jongho; Mathur, Kartik; Merok, Petter; Milovanovic, Ivana; Paananen, Nani; Paananen, Vesa-Matti; Pavlenko, Anna; Vidal, Bruno Pereira; Strika, Luciano Ivan; Tsao, Yueh; Turcato, Davide; Vakhno, Oleksandr; Velcsov, Judit; Vickers, Anna; Visser, Stéphanie F.; Widarmanto, Herdyan; Zaikin, Andrey; Chen, Si-Qing

doi:10.1609/AAAI.V39I27.35011

RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?

AAAI 2025 pp. 27940-27950

doi:10.1609/AAAI.V39I27.35011 /aaai/2025/dewynter2025aaai-rtp/

Abstract

Large language models (LLMs) and small language models (SLMs) are being adopted at remarkable speed, although their safety still remains a serious concern. With the advent of multilingual S/LLMs, the question now becomes a matter of scale: can we expand multilingual safety evaluations of these models with the same velocity at which they are deployed? To this end, we introduce RTP-LX, a human-transcreated and human-annotated corpus of toxic prompts and outputs in 28 languages. RTP-LX follows participatory design practices, and a portion of the corpus is especially designed to detect culturally-specific toxic language. We evaluate 10 S/LLMs on their ability to detect toxic content in a culturally-sensitive, multilingual scenario. We find that, although they typically score acceptably in terms of accuracy, they have low agreement with human judges when scoring holistically the toxicity of a prompt; and have difficulty discerning harm in context-dependent scenarios, particularly with subtle-yet-harmful content (e.g. microaggressions, bias). We release this dataset to contribute to further reduce harmful uses of these models and improve their safe deployment.

PDF AAAI Semantic Scholar

Cite

Text

de Wynter et al. "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I27.35011

Markdown

[de Wynter et al. "RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/dewynter2025aaai-rtp/) doi:10.1609/AAAI.V39I27.35011

BibTeX

@inproceedings{dewynter2025aaai-rtp,
  title     = {{RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?}},
  author    = {de Wynter, Adrian and Watts, Ishaan and Wongsangaroonsri, Tua and Zhang, Minghui and Farra, Noura and Altintoprak, Nektar Ege and Baur, Lena and Claudet, Samantha and Gajdusek, Pavel and Gu, Qilong and Kaminska, Anna and Kaminski, Tomasz and Kuo, Ruby and Kyuba, Akiko and Lee, Jongho and Mathur, Kartik and Merok, Petter and Milovanovic, Ivana and Paananen, Nani and Paananen, Vesa-Matti and Pavlenko, Anna and Vidal, Bruno Pereira and Strika, Luciano Ivan and Tsao, Yueh and Turcato, Davide and Vakhno, Oleksandr and Velcsov, Judit and Vickers, Anna and Visser, Stéphanie F. and Widarmanto, Herdyan and Zaikin, Andrey and Chen, Si-Qing},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {27940-27950},
  doi       = {10.1609/AAAI.V39I27.35011},
  url       = {https://mlanthology.org/aaai/2025/dewynter2025aaai-rtp/}
}