On the Trade-Off of Intra-/Inter-Class Diversity for Supervised Pre-Training

Abstract

Pre-training datasets are critical for building state-of-the-art machine learning models, motivating rigorous study on their impact on downstream tasks. In this work, we study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset. Empirically, we found that with the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity. To understand the underlying mechanism, we show theoretically that the downstream performance depends monotonically on both types of diversity. Notably, our theory reveals that the optimal class-to-sample ratio (#classes / #samples per class) is invariant to the size of the pre-training dataset, which motivates an application of predicting the optimal number of pre-training classes. We demonstrate the effectiveness of this application by an improvement of around 2 points on the downstream tasks when using ImageNet as the pre-training dataset.

Cite

Text

Zhang et al. "On the Trade-Off of Intra-/Inter-Class Diversity for Supervised Pre-Training." Neural Information Processing Systems, 2023.

Markdown

[Zhang et al. "On the Trade-Off of Intra-/Inter-Class Diversity for Supervised Pre-Training." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/zhang2023neurips-tradeoff-a/)

BibTeX

@inproceedings{zhang2023neurips-tradeoff-a,
  title     = {{On the Trade-Off of Intra-/Inter-Class Diversity for Supervised Pre-Training}},
  author    = {Zhang, Jieyu and Wang, Bohan and Hu, Zhengyu and Koh, Pang Wei W and Ratner, Alexander J},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/zhang2023neurips-tradeoff-a/}
}