RefDetector: A Simple yet Effective Matching-Based Method for Referring Expression Comprehension

Abstract

Despite the rapid and substantial advancements in object detection, it continues to face limitations imposed by pre-defined category sets. Current methods for visual grounding primarily focus on how to better leverage the visual backbone to generate text-tailored visual features, which may require adjusting the parameters of the entire model. Besides, some early methods, \ie, matching-based method, build upon and extend the functionality of existing object detectors by enabling them to localize an object based on free-form linguistic expressions, which have good application potential. However, the untapped potential of the matching-based approach has not been fully realized due to inadequate exploration. In this paper, we first analyze the limitations that exist in the current matching-based method (\ie, mismatch problem and complicated fusion mechanisms), and then present a simple yet effective matching-based method, namely RefDetector. To tackle the above issues, we devise a simple heuristic rule to generate proposals with improved referent recall. Additionally, we introduce a straightforward vision-language interaction module that eliminates the need for intricate manually-designed mechanisms. Moreover, we have explored the visual grounding based on the modern detector DETR, and achieved significant performance improvement. Extensive experiments on three REC benchmark datasets, \ie, RefCOCO, RefCOCO+, and RefCOCOg validate the effectiveness of the proposed method.

Cite

Text

Wang et al. "RefDetector: A Simple yet Effective Matching-Based Method for Referring Expression Comprehension." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I8.32866

Markdown

[Wang et al. "RefDetector: A Simple yet Effective Matching-Based Method for Referring Expression Comprehension." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/wang2025aaai-refdetector/) doi:10.1609/AAAI.V39I8.32866

BibTeX

@inproceedings{wang2025aaai-refdetector,
  title     = {{RefDetector: A Simple yet Effective Matching-Based Method for Referring Expression Comprehension}},
  author    = {Wang, Yabing and Tian, Zhuotao and Qin, Zheng and Zhou, Sanping and Wang, Le},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2025},
  pages     = {8033-8041},
  doi       = {10.1609/AAAI.V39I8.32866},
  url       = {https://mlanthology.org/aaai/2025/wang2025aaai-refdetector/}
}