A Data-Based Perspective on Transfer Learning

Abstract

It is commonly believed that more pre-training data leads to better transfer learning performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we present a framework for probing the impact of the source dataset's composition on transfer learning performance. Our framework facilitates new capabilities such as identifying transfer learning brittleness and detecting pathologies such as data-leakage and the presence of misleading examples in the source dataset. In particular, we demonstrate that removing detrimental datapoints identified by our framework improves transfer performance from ImageNet on a variety of transfer tasks.

Cite

Text

Jain et al. "A Data-Based Perspective on Transfer Learning." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00352

Markdown

[Jain et al. "A Data-Based Perspective on Transfer Learning." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/jain2023cvpr-databased/) doi:10.1109/CVPR52729.2023.00352

BibTeX

@inproceedings{jain2023cvpr-databased,
  title     = {{A Data-Based Perspective on Transfer Learning}},
  author    = {Jain, Saachi and Salman, Hadi and Khaddaj, Alaa and Wong, Eric and Park, Sung Min and Mądry, Aleksander},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2023},
  pages     = {3613-3622},
  doi       = {10.1109/CVPR52729.2023.00352},
  url       = {https://mlanthology.org/cvpr/2023/jain2023cvpr-databased/}
}