Unlearning Geo-Cultural Stereotypes in Multilingual LLMs

Abstract

As multilingual generative models become more widely used, most safety and fairness evaluation techniques still focus on English-language resources, while overlooking important cross-cultural factors. This limitation raises concerns about fairness and safety, particularly regarding geoculturally situated stereotypes that hinder the models’ global inclusivity. In this work, we present preliminary findings on the impact of stereotype unlearning across languages, specifically in English, French, and Hindi. Using an adapted version of the SeeGULL dataset, we analyze how unlearning stereotypes in one language influences other languages within multilingual large language models. Our study evaluates two model families, Llama-3.1-8B and Aya-Expanse-8B, to assess whether unlearning in one linguistic context transfers across languages, potentially mitigating or exacerbating biases in multilingual settings.

Cite

Text

Farashah et al. "Unlearning Geo-Cultural Stereotypes in Multilingual LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.

Markdown

[Farashah et al. "Unlearning Geo-Cultural Stereotypes in Multilingual LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/farashah2025iclrw-unlearning/)

BibTeX

@inproceedings{farashah2025iclrw-unlearning,
  title     = {{Unlearning Geo-Cultural Stereotypes in Multilingual LLMs}},
  author    = {Farashah, Alireza Dehghanpour and Khandelwal, Aditi and Rostamzadeh, Negar and Farnadi, Golnoosh},
  booktitle = {ICLR 2025 Workshops: BuildingTrust},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/farashah2025iclrw-unlearning/}
}