Extra Training Provides a Strong Baseline for CLIP
Abstract
Contrastive Language-Image Pretraining (CLIP) models exhibit good performance on a range of vision tasks. To improve the performance of this class of models even further, several works have proposed to modify the CLIP training procedure. In this work, we show that it is possible to achieve substantial gains using a much simpler strategy. Specifically, existing CLIP models---especially those trained on smaller datasets---tend to be undertrained. As a result, simply extending the training procedure according to a simple heuristic can significantly improve the performance of CLIP models.
Cite
Text
Khaddaj et al. "Extra Training Provides a Strong Baseline for CLIP." NeurIPS 2023 Workshops: R0-FoMo, 2023.Markdown
[Khaddaj et al. "Extra Training Provides a Strong Baseline for CLIP." NeurIPS 2023 Workshops: R0-FoMo, 2023.](https://mlanthology.org/neuripsw/2023/khaddaj2023neuripsw-extra/)BibTeX
@inproceedings{khaddaj2023neuripsw-extra,
title = {{Extra Training Provides a Strong Baseline for CLIP}},
author = {Khaddaj, Alaa and Salman, Hadi and Ilyas, Andrew and Leclerc, Guillaume and Madry, Aleksander},
booktitle = {NeurIPS 2023 Workshops: R0-FoMo},
year = {2023},
url = {https://mlanthology.org/neuripsw/2023/khaddaj2023neuripsw-extra/}
}