Recursive Coarse-to-Fine Localization for Fast Object Detection

Abstract

Cascading techniques are commonly used to speed-up the scan of an image for object detection. However, cascades of detectors are slow to train due to the high number of detectors and corresponding thresholds to learn. Furthermore, they do not use any prior knowledge about the scene structure to decide where to focus the search. To handle these problems, we propose a new way to scan an image, where we couple a recursive coarse-to-fine refinement together with spatial constraints of the object location. For doing that we split an image into a set of uniformly distributed neighborhood regions, and for each of these we apply a local greedy search over feature resolutions. The neighborhood is defined as a scanning region that only one object can occupy. Therefore the best hypothesis is obtained as the location with maximum score and no thresholds are needed. We present an implementation of our method using a pyramid of HOG features and we evaluate it on two standard databases, VOC2007 and INRIA dataset. Results show that the Recursive Coarse-to-Fine Localization (RCFL) achieves a 12x speed-up compared to standard sliding windows. Compared with a cascade of multiple resolutions approach our method has slightly better performance in speed and Average-Precision. Furthermore, in contrast to cascading approach, the speed-up is independent of image conditions, the number of detected objects and clutter.

Cite

Text

Pedersoli et al. "Recursive Coarse-to-Fine Localization for Fast Object Detection." European Conference on Computer Vision, 2010. doi:10.1007/978-3-642-15567-3_21

Markdown

[Pedersoli et al. "Recursive Coarse-to-Fine Localization for Fast Object Detection." European Conference on Computer Vision, 2010.](https://mlanthology.org/eccv/2010/pedersoli2010eccv-recursive/) doi:10.1007/978-3-642-15567-3_21

BibTeX

@inproceedings{pedersoli2010eccv-recursive,
  title     = {{Recursive Coarse-to-Fine Localization for Fast Object Detection}},
  author    = {Pedersoli, Marco and Gonzàlez, Jordi and Bagdanov, Andrew D. and Villanueva, Juan José},
  booktitle = {European Conference on Computer Vision},
  year      = {2010},
  pages     = {280-293},
  doi       = {10.1007/978-3-642-15567-3_21},
  url       = {https://mlanthology.org/eccv/2010/pedersoli2010eccv-recursive/}
}