FALCON: Fair Active Learning for Content Moderation

Abstract

Content moderation is the task of filtering inappropriate content (e.g., rude, hateful, or toxic posts) on online platforms. Deep learning models have been developed to address this task, however they tend to be prone to making unfair decisions for underrepresented groups such as racial minorities. Most popular methods for improving fairness only focus on a single group and single class bias, while multi-group and multi-class biases are prevalent and challenging in content moderation. In this paper, we present a novel framework, Fair Active Learning for CONtent moderation (FALCON), that helps mitigate multi-group and multi-class biases simultaneously while maintaining performance. We present a novel group-aware sample selection algorithm to actively select a subset of the entire dataset for training, and novel augmented uncertainty information that improves the query sample selection strategy by considering group fairness levels. We validate FALCON using multiple fairness evaluation metrics on three public datasets, including the Jigsaw Unintended Bias dataset. Our results show that FALCON maintains comparable performance to several bias mitigation methods while obtaining higher group fairness across multiple axes and datasets, as measured by a 22.5% improvement in demographic parity difference and an 8.4% improvement for equalized odds on average. Experiments on the Amazon Review dataset demonstrate the general applicability of FALCON beyond content moderation datasets. Warning: some content in this paper may be harmful, racist, and inappropriate.

Cite

Text

Wang et al. "FALCON: Fair Active Learning for Content Moderation." European Conference on Computer Vision Workshops, 2024. doi:10.1007/978-3-031-92648-8_1

Markdown

[Wang et al. "FALCON: Fair Active Learning for Content Moderation." European Conference on Computer Vision Workshops, 2024.](https://mlanthology.org/eccvw/2024/wang2024eccvw-falcon/) doi:10.1007/978-3-031-92648-8_1

BibTeX

@inproceedings{wang2024eccvw-falcon,
  title     = {{FALCON: Fair Active Learning for Content Moderation}},
  author    = {Wang, Zuhui and Sajeev, Sandra and Mittal, Gaurav and Hall, Matthew and Yu, Ye and Yin, Zhaozheng and Chen, Mei},
  booktitle = {European Conference on Computer Vision Workshops},
  year      = {2024},
  pages     = {1-17},
  doi       = {10.1007/978-3-031-92648-8_1},
  url       = {https://mlanthology.org/eccvw/2024/wang2024eccvw-falcon/}
}