PreDet: Large-Scale Weakly Supervised Pre-Training for Detection

Abstract

State-of-the-art object detection approaches typically rely on pre-trained classification models to achieve better performance and faster convergence. We hypothesize that classification pre-training strives to achieve translation invariance, and consequently ignores the localization aspect of the problem. We propose a new large-scale pre-training strategy for detection, where noisy class labels are available for all images, but not bounding-boxes. In this setting, we augment standard classification pre-training with a new detection-specific pretext task. Motivated by the noise-contrastive learning based self-supervised approaches, we design a task that forces bounding boxes with high-overlap to have similar representations in different views of an image, compared to non-overlapping boxes. We redesign Faster R-CNN modules to perform this task efficiently. Our experimental results show significant improvements over existing weakly-supervised and self-supervised pre-training approaches in both detection accuracy as well as fine-tuning speed.

Cite

Text

Ramanathan et al. "PreDet: Large-Scale Weakly Supervised Pre-Training for Detection." International Conference on Computer Vision, 2021. doi:10.1109/ICCV48922.2021.00286

Markdown

[Ramanathan et al. "PreDet: Large-Scale Weakly Supervised Pre-Training for Detection." International Conference on Computer Vision, 2021.](https://mlanthology.org/iccv/2021/ramanathan2021iccv-predet/) doi:10.1109/ICCV48922.2021.00286

BibTeX

@inproceedings{ramanathan2021iccv-predet,
  title     = {{PreDet: Large-Scale Weakly Supervised Pre-Training for Detection}},
  author    = {Ramanathan, Vignesh and Wang, Rui and Mahajan, Dhruv},
  booktitle = {International Conference on Computer Vision},
  year      = {2021},
  pages     = {2865-2875},
  doi       = {10.1109/ICCV48922.2021.00286},
  url       = {https://mlanthology.org/iccv/2021/ramanathan2021iccv-predet/}
}