Open-Sampling: Exploring Out-of-Distribution Data for Re-Balancing Long-Tailed Datasets

Abstract

Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance. Recent studies found that directly training with out-of-distribution data (i.e., open-set samples) in a semi-supervised manner would harm the generalization performance. In this work, we theoretically show that out-of-distribution data can still be leveraged to augment the minority classes from a Bayesian perspective. Based on this motivation, we propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset. For each open-set instance, the label is sampled from our pre-defined distribution that is complementary to the distribution of original class priors. We empirically show that Open-sampling not only re-balances the class priors but also encourages the neural network to learn separable representations. Extensive experiments demonstrate that our proposed method significantly outperforms existing data re-balancing methods and can boost the performance of existing state-of-the-art methods.

Cite

Text

Wei et al. "Open-Sampling: Exploring Out-of-Distribution Data for Re-Balancing Long-Tailed Datasets." International Conference on Machine Learning, 2022.

Markdown

[Wei et al. "Open-Sampling: Exploring Out-of-Distribution Data for Re-Balancing Long-Tailed Datasets." International Conference on Machine Learning, 2022.](https://mlanthology.org/icml/2022/wei2022icml-opensampling/)

BibTeX

@inproceedings{wei2022icml-opensampling,
  title     = {{Open-Sampling: Exploring Out-of-Distribution Data for Re-Balancing Long-Tailed Datasets}},
  author    = {Wei, Hongxin and Tao, Lue and Xie, Renchunzi and Feng, Lei and An, Bo},
  booktitle = {International Conference on Machine Learning},
  year      = {2022},
  pages     = {23615-23630},
  volume    = {162},
  url       = {https://mlanthology.org/icml/2022/wei2022icml-opensampling/}
}