Improving Subgroup Robustness via Data Selection

Abstract

Machine learning models can often fail on subgroups that are underrepresentedduring training. While dataset balancing can improve performance onunderperforming groups, it requires access to training group annotations and canend up removing large portions of the dataset. In this paper, we introduceData Debiasing with Datamodels (D3M), a debiasing approachwhich isolates and removes specific training examples that drive the model'sfailures on minority groups. Our approach enables us to efficiently traindebiased classifiers while removing only a small number of examples, and doesnot require training group annotations or additional hyperparameter tuning.

Cite

Text

Jain et al. "Improving Subgroup Robustness via Data Selection." Neural Information Processing Systems, 2024. doi:10.52202/079017-2995

Markdown

[Jain et al. "Improving Subgroup Robustness via Data Selection." Neural Information Processing Systems, 2024.](https://mlanthology.org/neurips/2024/jain2024neurips-improving/) doi:10.52202/079017-2995

BibTeX

@inproceedings{jain2024neurips-improving,
  title     = {{Improving Subgroup Robustness via Data Selection}},
  author    = {Jain, Saachi and Hamidieh, Kimia and Georgiev, Kristian and Ilyas, Andrew and Ghassemi, Marzyeh and Mądry, Aleksander},
  booktitle = {Neural Information Processing Systems},
  year      = {2024},
  doi       = {10.52202/079017-2995},
  url       = {https://mlanthology.org/neurips/2024/jain2024neurips-improving/}
}