The Effect of Model Size on Worst-Group Generalization
Abstract
Overparameterization is shown to hurt test accuracy on rare subgroups under the fixed reweighing regime. To gain a more complete picture, we consider the case where subgroup information is unknown. We investigate the effect of model size on worst-group generalization under empirical risk minimization (ERM) across a wide range of settings, varying: 1) architectures (ResNet, VGG, or BERT), 2) domains (vision or natural language processing), 3) model size (width or depth), and 4) initialization (with pre-trained or random weights). Our systematic evaluation reveals that increasing model size does not hurt, and may help, worst-group test performance under ERM across all setups. In particular, increasing pre-trained model size consistently improves performance on Waterbirds and MultiNLI. We advise practitioners to use larger pre-trained models when subgroup labels are unknown.
Cite
Text
Le Pham et al. "The Effect of Model Size on Worst-Group Generalization." NeurIPS 2021 Workshops: DistShift, 2021.Markdown
[Le Pham et al. "The Effect of Model Size on Worst-Group Generalization." NeurIPS 2021 Workshops: DistShift, 2021.](https://mlanthology.org/neuripsw/2021/pham2021neuripsw-effect/)BibTeX
@inproceedings{pham2021neuripsw-effect,
title = {{The Effect of Model Size on Worst-Group Generalization}},
author = {Le Pham, Alan and Chan, Eunice and Srivatsa, Vikranth and Ghosh, Dhruba and Yang, Yaoqing and Yu, Yaodong and Zhong, Ruiqi and Gonzalez, Joseph E. and Steinhardt, Jacob},
booktitle = {NeurIPS 2021 Workshops: DistShift},
year = {2021},
url = {https://mlanthology.org/neuripsw/2021/pham2021neuripsw-effect/}
}