On the Connection Between Pre-Training Data Diversity and Robustness

Vivek Ramanujan, Thao Nguyen, Ludwig Schmidt, Ali Farhadi

ICMLW 2022

/icmlw/2022/ramanujan2022icmlw-connection/

Abstract

Our work studies the implications of transfer learning on model behavior beyond accuracy: how does the pre-training distribution affect the downstream robustness of a fine-tuned model? We analyze model effective robustness using the framework proposed by Taori et al. (2020), which demonstrates that in-distribution and out-of-distribution performances are highly correlated along a robustness linear trend. We explore various interventions that significantly alter the pre-training distribution, including label space, label semantics, and the pre-training dataset itself. In most cases, changes during pre-training have minimal impact on the original linear trend produced by pre-training models on the full ImageNet dataset. We demonstrate these findings on pre-training distributions constructed from ImageNet and iNaturalist, with the fine-tuning task being iWildCams-WILDS animal classification.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Ramanujan et al. "On the Connection Between Pre-Training Data Diversity and Robustness." ICML 2022 Workshops: Pre-Training, 2022.

Markdown

[Ramanujan et al. "On the Connection Between Pre-Training Data Diversity and Robustness." ICML 2022 Workshops: Pre-Training, 2022.](https://mlanthology.org/icmlw/2022/ramanujan2022icmlw-connection/)

BibTeX

@inproceedings{ramanujan2022icmlw-connection,
  title     = {{On the Connection Between Pre-Training Data Diversity and Robustness}},
  author    = {Ramanujan, Vivek and Nguyen, Thao and Schmidt, Ludwig and Farhadi, Ali},
  booktitle = {ICML 2022 Workshops: Pre-Training},
  year      = {2022},
  url       = {https://mlanthology.org/icmlw/2022/ramanujan2022icmlw-connection/}
}