Soft-Label Integration for Robust Toxicity Classification

Abstract

Toxicity classification in textual content remains a significant problem. Data with labels from a single annotator fall short of capturing the diversity of human perspectives. Therefore, there is a growing need to incorporate crowdsourced annotations for training an effective toxicity classifier. Additionally, the standard approach to training a classifier using empirical risk minimization (ERM) may fail to address the potential shifts between the training set and testing set due to exploiting spurious correlations. This work introduces a novel bi-level optimization framework that integrates crowdsourced annotations with the soft-labeling technique and optimizes the soft-label weights by Group Distributionally Robust Optimization (GroupDRO) to enhance the robustness against out-of-distribution (OOD) risk. We theoretically prove the convergence of our bi-level optimization algorithm. Experimental results demonstrate that our approach outperforms existing baseline methods in terms of both average and worst-group accuracy, confirming its effectiveness in leveraging crowdsourced annotations to achieve more effective and robust toxicity classification.

Cite

Text

Cheng et al. "Soft-Label Integration for Robust Toxicity Classification." Neural Information Processing Systems, 2024. doi:10.52202/079017-3004

Markdown

[Cheng et al. "Soft-Label Integration for Robust Toxicity Classification." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/cheng2024neurips-softlabel/) doi:10.52202/079017-3004

BibTeX

@inproceedings{cheng2024neurips-softlabel,
  title     = {{Soft-Label Integration for Robust Toxicity Classification}},
  author    = {Cheng, Zelei and Wu, Xian and Yu, Jiahao and Han, Shuo and Cai, Xin-Qiang and Xing, Xinyu},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-3004},
  url       = {https://mlanthology.org/neurips/2024/cheng2024neurips-softlabel/}
}