On the Connection Between Pre-Training Data Diversity and Robustness
Abstract
Our work studies the implications of transfer learning on model behavior beyond accuracy: how does the pre-training distribution affect the downstream robustness of a fine-tuned model? We analyze model effective robustness using the framework proposed by Taori et al. (2020), which demonstrates that in-distribution and out-of-distribution performances are highly correlated along a robustness linear trend. We explore various interventions that significantly alter the pre-training distribution, including label space, label semantics, and the pre-training dataset itself. In most cases, changes during pre-training have minimal impact on the original linear trend produced by pre-training models on the full ImageNet dataset. We demonstrate these findings on pre-training distributions constructed from ImageNet and iNaturalist, with the fine-tuning task being iWildCams-WILDS animal classification.
Cite
Text
Ramanujan et al. "On the Connection Between Pre-Training Data Diversity and Robustness." ICML 2022 Workshops: Pre-Training, 2022.Markdown
[Ramanujan et al. "On the Connection Between Pre-Training Data Diversity and Robustness." ICML 2022 Workshops: Pre-Training, 2022.](https://mlanthology.org/icmlw/2022/ramanujan2022icmlw-connection/)BibTeX
@inproceedings{ramanujan2022icmlw-connection,
title = {{On the Connection Between Pre-Training Data Diversity and Robustness}},
author = {Ramanujan, Vivek and Nguyen, Thao and Schmidt, Ludwig and Farhadi, Ali},
booktitle = {ICML 2022 Workshops: Pre-Training},
year = {2022},
url = {https://mlanthology.org/icmlw/2022/ramanujan2022icmlw-connection/}
}