Augmentation Alone Leads to Generalization
Abstract
We study self-supervised representation learning with data augmentation, such as contrastive learning and masked image/language modeling. Our main result is that a sufficiently good data augmentation technique alone can lead to good generalization, for which we prove generalization bounds for an arbitrary encoder with a model-free analysis. Our results model the upstream stage as RKHS approximation and the downstream stage as RKHS regression, where the RKHS is fully determined by the augmentation. We identify augmentation complexity as a key ingredient that replaces the model complexity and additionally use it to quantitatively analyze augmentations on real datasets.
Cite
Text
Zhai et al. "Augmentation Alone Leads to Generalization." ICLR 2024 Workshops: R2-FM, 2024.Markdown
[Zhai et al. "Augmentation Alone Leads to Generalization." ICLR 2024 Workshops: R2-FM, 2024.](https://mlanthology.org/iclrw/2024/zhai2024iclrw-augmentation/)BibTeX
@inproceedings{zhai2024iclrw-augmentation,
title = {{Augmentation Alone Leads to Generalization}},
author = {Zhai, Runtian and Liu, Bingbin and Risteski, Andrej and Kolter, J Zico and Ravikumar, Pradeep Kumar},
booktitle = {ICLR 2024 Workshops: R2-FM},
year = {2024},
url = {https://mlanthology.org/iclrw/2024/zhai2024iclrw-augmentation/}
}