AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale

Abstract

Social bias in Pretrained Language Models (PLMs) affects text generation and other downstream NLP tasks. Existing bias testing methods rely predominantly on manual templates or on expensive crowd-sourced data. We propose a novel AutoBiasTest method that automatically generates controlled sentences for testing bias in PLMs, hence providing a flexible and low-cost alternative. Our approach uses another PLM for generation controlled by conditioning on social group and attribute terms. We show that generated sentences are natural and similar to human-produced content in terms of word length and diversity. We find that our bias scores are well correlated with manual templates, but AutoBiasTest highlights biases not captured by these templates due to more diverse and realistic contexts. By automating large-scale test sentence generation, we enable better estimation of underlying bias distributions.

Cite

Text

Kocielnik et al. "AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Kocielnik et al. "AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/kocielnik2023icmlw-autobiastest/)

BibTeX

@inproceedings{kocielnik2023icmlw-autobiastest,
  title     = {{AutoBiasTest: Controllable Test Sentence Generation for Open-Ended Social Bias Testing in Language Models at Scale}},
  author    = {Kocielnik, Rafal Dariusz and Prabhumoye, Shrimai and Zhang, Vivian L and Alvarez, R. Michael and Anandkumar, Anima},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/kocielnik2023icmlw-autobiastest/}
}