Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval
Abstract
In this paper, we introduce SearchDet, a training-free long-tail object detection framework that significantly enhances open-vocabulary object detection performance. SearchDet retrieves a set of positive and negative images of an object to ground, embeds these images, and computes an input image--weighted query which is used to detect the desired concept in the image. Our proposed method is simple and training-free, yet achieves over 16.81% mAP improvement on ODinW and 59.85% mAP improvement on LVIS compared to state-of-the-art models such as GroundingDINO. We further show that our approach of basing object detection on a set of Web-retrieved exemplars is stable with respect to variations in the exemplars, suggesting a path towards eliminating costly data annotation and training procedures.
Cite
Text
Sidhu et al. "Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval." Conference on Computer Vision and Pattern Recognition, 2025. doi:10.1109/CVPR52734.2025.01409Markdown
[Sidhu et al. "Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval." Conference on Computer Vision and Pattern Recognition, 2025.](https://mlanthology.org/cvpr/2025/sidhu2025cvpr-search/) doi:10.1109/CVPR52734.2025.01409BibTeX
@inproceedings{sidhu2025cvpr-search,
title = {{Search and Detect: Training-Free Long Tail Object Detection via Web-Image Retrieval}},
author = {Sidhu, Mankeerat and Chopra, Hetarth and Blume, Ansel and Kim, Jeonghwan and Reddy, Revanth Gangi and Ji, Heng},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2025},
pages = {15129-15138},
doi = {10.1109/CVPR52734.2025.01409},
url = {https://mlanthology.org/cvpr/2025/sidhu2025cvpr-search/}
}