Generalizing to Unseen Domains via Text-Guided Augmentation

Abstract

To avoid the high cost of collecting visual data from all test domains in the domain adaptation task, recent work takes advantage of the pre-trained large-scale vision language models and augment training data with only text descriptions (e.g.,“a photo/painting/sketch...”) of each test domain. However, in many real-world applications, such text information of test domains is not always available in advance. Moreover, even if we can verbalize all test domains, it is laborious for existing work [?] to train a different augmentation network for each possible unseen domain, which suffers from time-inefficiency. To overcome these challenges, we benefit from the multimodal embedding space of a pre-trained vision-language model and propose to acquire training-free and domain-invariant augmentations with text descriptions of arbitrary crafted unseen domains, which not necessarily match test domains. Beyond achieving state-of-the-art results, compared with existing works that require trainable augmentation networks, our approach is also notably more time-efficient, and exhibits a more solid theoretical support.

Cite

Text

Qi et al. "Generalizing to Unseen Domains via Text-Guided Augmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-72890-7_17

Markdown

[Qi et al. "Generalizing to Unseen Domains via Text-Guided Augmentation." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/qi2024eccv-generalizing/) doi:10.1007/978-3-031-72890-7_17

BibTeX

@inproceedings{qi2024eccv-generalizing,
  title     = {{Generalizing to Unseen Domains via Text-Guided Augmentation}},
  author    = {Qi, Daiqing and Zhao, Handong and Zhang, Aidong and Li, Sheng},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-72890-7_17},
  url       = {https://mlanthology.org/eccv/2024/qi2024eccv-generalizing/}
}