CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Abstract

Deriving reliable region-word alignment from image-text pairs is critical to learnobject-level vision-language representations for open-vocabulary object detection.Existing methods typically rely on pre-trained or self-trained vision-languagemodels for alignment, which are prone to limitations in localization accuracy orgeneralization capabilities. In this paper, we propose CoDet, a novel approachthat overcomes the reliance on pre-aligned vision-language space by reformulatingregion-word alignment as a co-occurring object discovery problem. Intuitively, bygrouping images that mention a shared concept in their captions, objects corresponding to the shared concept shall exhibit high co-occurrence among the group.CoDet then leverages visual similarities to discover the co-occurring objects andalign them with the shared concept. Extensive experiments demonstrate that CoDethas superior performances and compelling scalability in open-vocabulary detection,e.g., by scaling up the visual backbone, CoDet achieves 37.0 $AP^m_{novel}$ and 44.7 $AP^m_{all}$ on OV-LVIS, surpassing the previous SoTA by 4.2 $AP^m_{novel}$ and 9.8 $AP^m_{all}$. Code is available at https://github.com/CVMI-Lab/CoDet.

Cite

Text

Ma et al. "CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection." Neural Information Processing Systems, 2023.

Markdown

[Ma et al. "CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection." Neural Information Processing Systems, 2023.](https://mlanthology.org/neurips/2023/ma2023neurips-codet/)

BibTeX

@inproceedings{ma2023neurips-codet,
  title     = {{CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection}},
  author    = {Ma, Chuofan and Jiang, Yi and Wen, Xin and Yuan, Zehuan and Qi, Xiaojuan},
  booktitle = {Neural Information Processing Systems},
  year      = {2023},
  url       = {https://mlanthology.org/neurips/2023/ma2023neurips-codet/}
}