CLIP the Gap: A Single Domain Generalization Approach for Object Detection
Abstract
Single Domain Generalization (SDG) tackles the problem of training a model on a single source domain so that it generalizes to any unseen target domain. While this has been well studied for image classification, the literature on SDG object detection remains almost non-existent. To address the challenges of simultaneously learning robust object localization and representation, we propose to leverage a pre-trained vision-language model to introduce semantic domain concepts via textual prompts. We achieve this via a semantic augmentation strategy acting on the features extracted by the detector backbone, as well as a text-based classification loss. Our experiments evidence the benefits of our approach, outperforming by 10% the only existing SDG object detection method, Single-DGOD[49], on their own diverse weather-driving benchmark.
Cite
Text
Vidit et al. "CLIP the Gap: A Single Domain Generalization Approach for Object Detection." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.00314Markdown
[Vidit et al. "CLIP the Gap: A Single Domain Generalization Approach for Object Detection." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/vidit2023cvpr-clip/) doi:10.1109/CVPR52729.2023.00314BibTeX
@inproceedings{vidit2023cvpr-clip,
title = {{CLIP the Gap: A Single Domain Generalization Approach for Object Detection}},
author = {Vidit, Vidit and Engilberge, Martin and Salzmann, Mathieu},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {3219-3229},
doi = {10.1109/CVPR52729.2023.00314},
url = {https://mlanthology.org/cvpr/2023/vidit2023cvpr-clip/}
}