Using Language to Extend to Unseen Domains

Abstract

It is expensive to collect training data for every possible domain that a vision model may encounter when deployed. We instead consider how simply $\textit{verbalizing}$ the training domain (e.g.``photos of birds'') as well as domains we want to extend to but do not have data for (e.g.``paintings of birds'') can improve robustness. Using a multimodal model with a joint image and language embedding space, our method $\textit{LADS}$ learns a transformation of the image embeddings from the source domain to each target domain, while preserving task relevant information. Without using any images from the target domain, we show that over the $\textit{extended}$ domain containing both source and target, $\textit{LADS}$ outperforms standard fine-tuning and ensemble approaches over a suite of 4 benchmarks targeting domain adaptation and dataset bias.

Cite

Text

Dunlap et al. "Using Language to Extend to Unseen Domains." International Conference on Learning Representations, 2023.

Markdown

[Dunlap et al. "Using Language to Extend to Unseen Domains." International Conference on Learning Representations, 2023.](https://mlanthology.org/iclr/2023/dunlap2023iclr-using/)

BibTeX

@inproceedings{dunlap2023iclr-using,
  title     = {{Using Language to Extend to Unseen Domains}},
  author    = {Dunlap, Lisa and Mohri, Clara and Guillory, Devin and Zhang, Han and Darrell, Trevor and Gonzalez, Joseph E. and Raghunathan, Aditi and Rohrbach, Anna},
  booktitle = {International Conference on Learning Representations},
  year      = {2023},
  url       = {https://mlanthology.org/iclr/2023/dunlap2023iclr-using/}
}