Twin Studies of Factors in OOD Generalization

Abstract

Studies of model behavior often neglect the role of random variation in out-of-distribution (OOD) behavior. To enable research on the interplay between random factors and training conditions in determining model behavior, we describe a simple ambiguous setting where models can learn either a counting-based or hierarchical classification rule. We find LSTMs consistently learn the hierarchical rule, while transformer models demonstrate diverse generalization rules across different hyperparameter and random seed settings. We analyze this model population across and at the end of training, using natural variation to draw conclusions about determinants of model performance and generalization. In particular, we quantify the impact of particular hyperparameter choices, finding that different model depths favor different rules and that regularization drives multimodally distributed generalization capabilities. We also release the weights of the 270 transformer models we trained spanning a wide range of OOD behavior, which can serve as a sandbox for theoretical and interpretability investigations.

Cite

Text

Li et al. "Twin Studies of Factors in OOD Generalization." NeurIPS 2024 Workshops: SciForDL, 2024.

Markdown

[Li et al. "Twin Studies of Factors in OOD Generalization." NeurIPS 2024 Workshops: SciForDL, 2024.](https://mlanthology.org/neuripsw/2024/li2024neuripsw-twin/)

BibTeX

@inproceedings{li2024neuripsw-twin,
  title     = {{Twin Studies of Factors in OOD Generalization}},
  author    = {Li, Victoria R and Kaufmann, Jenny and Alvarez-Melis, David and Saphra, Naomi},
  booktitle = {NeurIPS 2024 Workshops: SciForDL},
  year      = {2024},
  url       = {https://mlanthology.org/neuripsw/2024/li2024neuripsw-twin/}
}