Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny

Dutta, Arka; Khorramrouz, Adel; Dutta, Sujan; KhudaBukhsh, Ashiqur R.

doi:10.24963/ijcai.2024/801

Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny

Arka Dutta, Adel Khorramrouz, Sujan Dutta, Ashiqur R. KhudaBukhsh

IJCAI 2024 pp. 7242-7250

doi:10.24963/ijcai.2024/801 /ijcai/2024/dutta2024ijcai-down/

Abstract

Ground Penetrating Radar (GPR) provides detailed subterranean insights. Nevertheless, underground diagnosis via GPR is hindered by the fact that training data typically contain only normal samples, along with the complexity of GPR data’s wave-collection characteristics. This paper proposes subsurface anomaly detection within the Cubic Correlation Reservoir Network (CuCoRes) model space. CuCoRes incorporates three reservoirs with spatial correlation adjustment in each direction to adequately and accurately capture multi-directional dynamics (i.e., changing information) within GPR data. Fitting GPR data with CuCoRes and representing data with fitted models, the original GPR data is mapped into a category-discriminative CuCoRes model space, where anomalies could be efficiently identified and categorized based on model dissimilarities. Our approach leverages only limited normal GPR data, easily accessible, to support subsequent anomaly detection and categorization, enhancing its applicability in practical scenarios. Experiments on real-world data demonstrate its effectiveness, outperforming state-of-the-art.

PDF IJCAI Semantic Scholar

Cite

Text

Dutta et al. "Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny." International Joint Conference on Artificial Intelligence, 2024. doi:10.24963/ijcai.2024/801

Markdown

[Dutta et al. "Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny." International Joint Conference on Artificial Intelligence, 2024.](https://mlanthology.org/ijcai/2024/dutta2024ijcai-down/) doi:10.24963/ijcai.2024/801

BibTeX

@inproceedings{dutta2024ijcai-down,
  title     = {{Down the Toxicity Rabbit Hole: A Framework to Bias Audit Large Language Models with Key Emphasis on Racism, Antisemitism, and Misogyny}},
  author    = {Dutta, Arka and Khorramrouz, Adel and Dutta, Sujan and KhudaBukhsh, Ashiqur R.},
  booktitle = {International Joint Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {7242-7250},
  doi       = {10.24963/ijcai.2024/801},
  url       = {https://mlanthology.org/ijcai/2024/dutta2024ijcai-down/}
}