Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces

Abstract

Negative sampling schemes enable efficient training given a large number of classes, by offering a means to approximate a computationally expensive loss function that takes all labels into account. In this paper, we present a new connection between these schemes and loss modification techniques for countering label imbalance. We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels. Further, we provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance. We empirically verify our findings on long-tail classification and retrieval benchmarks.

Cite

Text

Rawat et al. "Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces." International Conference on Machine Learning, 2021.

Markdown

[Rawat et al. "Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces." International Conference on Machine Learning, 2021.](https://mlanthology.org/icml/2021/rawat2021icml-disentangling/)

BibTeX

@inproceedings{rawat2021icml-disentangling,
  title     = {{Disentangling Sampling and Labeling Bias for Learning in Large-Output Spaces}},
  author    = {Rawat, Ankit Singh and Menon, Aditya K and Jitkrittum, Wittawat and Jayasumana, Sadeep and Yu, Felix and Reddi, Sashank and Kumar, Sanjiv},
  booktitle = {International Conference on Machine Learning},
  year      = {2021},
  pages     = {8890-8901},
  volume    = {139},
  url       = {https://mlanthology.org/icml/2021/rawat2021icml-disentangling/}
}