Domain-Invariant Disentangled Network for Generalizable Object Detection

Abstract

We address the problem of domain generalizable object detection, which aims to learn a domain-invariant detector from multiple "seen" domains so that it can generalize well to other "unseen" domains. The generalization ability is crucial in practical scenarios especially when it is difficult to collect data. Compared to image classification, domain generalization in object detection has seldom been explored with more challenges brought by domain gaps on both image and instance levels. In this paper, we propose a novel generalizable object detection model, termed Domain-Invariant Disentangled Network (DIDN). In contrast to directly aligning multiple sources, we integrate a disentangled network into Faster R-CNN. By disentangling representations on both image and instance levels, DIDN is able to learn domain-invariant representations that are suitable for generalized object detection. Furthermore, we design a cross-level representation reconstruction to complement this two-level disentanglement so that informative object representations could be preserved. Extensive experiments are conducted on five benchmark datasets and the results demonstrate that our model achieves state-of-the-art performances on domain generalization for object detection.

Cite

Text

Lin et al. "Domain-Invariant Disentangled Network for Generalizable Object Detection." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00865

Markdown

[Lin et al. "Domain-Invariant Disentangled Network for Generalizable Object Detection." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/lin2021iccv-domaininvariant/) doi:10.1109/ICCV48922.2021.00865

BibTeX

@inproceedings{lin2021iccv-domaininvariant,
  title     = {{Domain-Invariant Disentangled Network for Generalizable Object Detection}},
  author    = {Lin, Chuang and Yuan, Zehuan and Zhao, Sicheng and Sun, Peize and Wang, Changhu and Cai, Jianfei},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {8771-8780},
  doi       = {10.1109/ICCV48922.2021.00865},
  url       = {https://mlanthology.org/iccv/2021/lin2021iccv-domaininvariant/}
}