Learning to Detect Every Thing in an Open World

Abstract

Many open-world applications require the detection of novel objects, yet state-of-the-art object detection and instance segmentation networks do not excel at this task. The key issue lies in their assumption that regions without any annotations should be suppressed as negatives, which teaches the model to treat any unannotated (hidden) objects as background. To address this issue, we propose a simple yet surprisingly powerful data augmentation and training scheme we call Learning to Detect Every Thing (LDET). To avoid suppressing hidden objects, we develop a new data augmentation method, BackErase, which pastes annotated objects on a background image sampled from a small region of the original image. Since training solely on such synthetically-augmented images suffers from domain shift, we propose a multi-domain training strategy that allows the model to generalize to real images. LDET leads to significant improvements on many datasets in the open-world instance segmentation task, outperforming baselines on cross-category generalization on COCO, as well as cross-dataset evaluation on UVO, Objects365, and Cityscapes.

Cite

Text

Saito et al. "Learning to Detect Every Thing in an Open World." Proceedings of the European Conference on Computer Vision (ECCV), 2022. doi:10.1007/978-3-031-20053-3_16

Markdown

[Saito et al. "Learning to Detect Every Thing in an Open World." Proceedings of the European Conference on Computer Vision (ECCV), 2022.](https://mlanthology.org/eccv/2022/saito2022eccv-learning/) doi:10.1007/978-3-031-20053-3_16

BibTeX

@inproceedings{saito2022eccv-learning,
  title     = {{Learning to Detect Every Thing in an Open World}},
  author    = {Saito, Kuniaki and Hu, Ping and Darrell, Trevor and Saenko, Kate},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2022},
  doi       = {10.1007/978-3-031-20053-3_16},
  url       = {https://mlanthology.org/eccv/2022/saito2022eccv-learning/}
}