Towards Multimodal Open-Set Domain Generalization and Adaptation Through Self-Supervision

Abstract

The task of open-set domain generalization (OSDG) involves recognizing novel classes within unseen domains, which becomes more challenging with multiple modalities as input. Existing works have only addressed unimodal OSDG within the meta-learning framework, without considering multimodal scenarios. In this work, we introduce a novel approach to address Multimodal Open-Set Domain Generalization (MM-OSDG) for the first time, utilizing self-supervision. To this end, we introduce two innovative multimodal self-supervised pretext tasks: Masked Cross-modal Translation and Multimodal Jigsaw Puzzles. These tasks facilitate the learning of multimodal representative features, thereby enhancing generalization and open-class detection capabilities. Additionally, we propose a novel entropy weighting mechanism to balance the loss across different modalities. Furthermore, we extend our approach to tackle also the Multimodal Open-Set Domain Adaptation (MM-OSDA) problem, especially in scenarios where unlabeled data from the target domain is available. Extensive experiments conducted under MM-OSDG, MM-OSDA, and Multimodal Closed-Set DG settings on the EPIC-Kitchens and HAC datasets demonstrate the efficacy and versatility of the proposed approach. Our source code is publicly available1 . 1 https://github.com/donghao51/MOOSA

Cite

Text

Dong et al. "Towards Multimodal Open-Set Domain Generalization and Adaptation Through Self-Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2024. doi:10.1007/978-3-031-73202-7_16

Markdown

[Dong et al. "Towards Multimodal Open-Set Domain Generalization and Adaptation Through Self-Supervision." Proceedings of the European Conference on Computer Vision (ECCV), 2024.](https://mlanthology.org/eccv/2024/dong2024eccv-multimodal/) doi:10.1007/978-3-031-73202-7_16

BibTeX

@inproceedings{dong2024eccv-multimodal,
  title     = {{Towards Multimodal Open-Set Domain Generalization and Adaptation Through Self-Supervision}},
  author    = {Dong, Hao and Chatzi, Eleni and Fink, Olga},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2024},
  doi       = {10.1007/978-3-031-73202-7_16},
  url       = {https://mlanthology.org/eccv/2024/dong2024eccv-multimodal/}
}