Improving Object Detection with Selective Self-Supervised Self-Training
Abstract
We study how to leverage Web images to augment human-curated object detection datasets. Our approach is two-pronged. On the one hand, we retrieve Web images by image-to-image search, which incurs less domain shift from the curated data than other search methods. The Web images are diverse, supplying a wide variety of object poses, appearances, their interactions with the context, etc. On the other hand, we propose a novel learning method motivated by two parallel lines of work that explore unlabeled data for image classification: self-training and self-supervised learning. They fail to improve object detectors in their vanilla forms due to the domain gap between the Web images and curated datasets. To tackle this challenge, we propose a selective net to rectify the supervision signals in Web images. It not only identifies positive bounding boxes but also creates a safe zone for mining hard negative boxes. We report state-of-the-art results on detecting backpacks and chairs from everyday scenes, along with other challenging object classes.
Cite
Text
Li et al. "Improving Object Detection with Selective Self-Supervised Self-Training." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58526-6_35Markdown
[Li et al. "Improving Object Detection with Selective Self-Supervised Self-Training." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/li2020eccv-improving-a/) doi:10.1007/978-3-030-58526-6_35BibTeX
@inproceedings{li2020eccv-improving-a,
title = {{Improving Object Detection with Selective Self-Supervised Self-Training}},
author = {Li, Yandong and Huang, Di and Qin, Danfeng and Wang, Liqiang and Gong, Boqing},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
year = {2020},
doi = {10.1007/978-3-030-58526-6_35},
url = {https://mlanthology.org/eccv/2020/li2020eccv-improving-a/}
}