Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias

Abstract

Generated texts from large language models (LLMs) have been shown to exhibit a variety of harmful, human-like biases against various demographics. These findings motivate research efforts aiming to understand and measure such effects. Prior works have proposed benchmarks for identifying and techniques for mitigating these stereotypical associations. However, as recent research pointed out, existing benchmarks lack a robust experimental setup, hindering the inference of meaningful conclusions from their evaluation metrics. In this paper, we first propose a causal framework and a list of desiderata for robustly measuring biases in generative language models. Building upon these design principles, we propose a benchmark called OccuGender, with a bias-measuring procedure to investigate occupational gender bias. We then use this benchmark to test several state-of-the-art open-source LLMs, including Llama, Mistral, and their instruction-tuned versions. The results show that these models exhibit substantial occupational gender bias.

PDF NeurIPSW OpenReview Semantic Scholar

Cite

Text

Chen et al. "Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias." NeurIPS 2024 Workshops: CALM, 2024.

Markdown

[Chen et al. "Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias." NeurIPS 2024 Workshops: CALM, 2024.](https://mlanthology.org/neuripsw/2024/chen2024neuripsw-causally/)

BibTeX

@inproceedings{chen2024neuripsw-causally,
  title     = {{Causally Testing Gender Bias in LLMs: A Case Study on Occupational Bias}},
  author    = {Chen, Yuen and Raghuram, Vethavikashini Chithrra and Mattern, Justus and Mihalcea, Rada and Jin, Zhijing},
  booktitle = {NeurIPS 2024 Workshops: CALM},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/chen2024neuripsw-causally/}
}