Navigating Towards Fairness with Data Selection
Abstract
Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels, which poses a significant challenge in ensuring fairness. Existing fairness techniques that address label bias typically involve modifying models and intervening in the training process, but these lack flexibility for large-scale datasets. To address this limitation, we introduce a data selection method designed to efficiently and flexibly mitigate label bias, tailored to more practical needs. Our approach utilizes a zero-shot predictor as a proxy model that simulates training on a clean holdout set. This strategy, supported by peer predictions, ensures the fairness of the proxy model and eliminates the need for an additional holdout set, which is a common requirement in previous methods. Without altering the classifier's architecture, our modality-agnostic method effectively selects appropriate training data and has proven efficient and effective in handling label bias and improving fairness across diverse datasets in experimental evaluations.
Cite
Text
Zhang et al. "Navigating Towards Fairness with Data Selection." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I21.34422Markdown
[Zhang et al. "Navigating Towards Fairness with Data Selection." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/zhang2025aaai-navigating/) doi:10.1609/AAAI.V39I21.34422BibTeX
@inproceedings{zhang2025aaai-navigating,
title = {{Navigating Towards Fairness with Data Selection}},
author = {Zhang, Yixuan and Li, Zhidong and Wang, Yang and Chen, Fang and Fan, Xuhui and Zhou, Feng},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {22632-22640},
doi = {10.1609/AAAI.V39I21.34422},
url = {https://mlanthology.org/aaai/2025/zhang2025aaai-navigating/}
}