Understanding Model Bias Requires Systematic Probing Across Tasks

Abstract

There is a growing body of literature exposing social biases of LLMs. However, these works often focus on a specific protected group, a specific prompt type and a specific decision task. Given the large and complex input-output space of LLMs, case-by-case analyses alone may not paint a picture of the systematic biases of these models. In this paper, we argue for broad and systematic bias probing. We propose to do so by comparing the distribution of outputs over a wide range of prompts, multiple protected attributes and across different realistic decision making settings in the same application domain. We demonstrate this approach for three personalized healthcare advice-seeking settings. We argue that studying the complex patterns of bias across tasks helps us better anticipate the way behaviors (specifically biased behaviors) of LLMs might generalize to new tasks.

Cite

Text

Boussard et al. "Understanding Model Bias Requires Systematic Probing Across Tasks." NeurIPS 2024 Workshops: SoLaR, 2024.

Markdown

[Boussard et al. "Understanding Model Bias Requires Systematic Probing Across Tasks." NeurIPS 2024 Workshops: SoLaR, 2024.](https://mlanthology.org/neuripsw/2024/boussard2024neuripsw-understanding/)

BibTeX

@inproceedings{boussard2024neuripsw-understanding,
  title     = {{Understanding Model Bias Requires Systematic Probing Across Tasks}},
  author    = {Boussard, Soline and Su, Susannah Cheng and Zhao, Helen and Swaroop, Siddharth and Pan, Weiwei},
  booktitle = {NeurIPS 2024 Workshops: SoLaR},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/boussard2024neuripsw-understanding/}
}