Self-Taught Object Localization with Deep Networks
Abstract
This paper introduces self-taught object localization, a novel approach that leverages deep convolutional networks trained for whole-image recognition to localize objects in images without additional human supervision, i.e., without using any ground-truth bounding boxes for training. The key idea is to analyze the change in the recognition scores when artificially masking out different regions of the image. The masking out of a region that includes the object typically causes a significant drop in recognition score. This idea is embedded into an agglomerative clustering technique that generates self-taught localization hypotheses. Our object localization scheme outperforms existing proposal methods in both precision and recall for small number of subwindow proposals (e.g., on ILSVRC-2012 it produces a relative gain of 23.4% over the state-of-the-art for top-1 hypothesis). Furthermore, our experiments show that the annotations automatically-generated by our method can be used to train object detectors yielding recognition results remarkably close to those obtained by training on manually-annotated bounding boxes.
Cite
Text
Bazzani et al. "Self-Taught Object Localization with Deep Networks." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016. doi:10.1109/WACV.2016.7477688Markdown
[Bazzani et al. "Self-Taught Object Localization with Deep Networks." IEEE/CVF Winter Conference on Applications of Computer Vision, 2016.](https://mlanthology.org/wacv/2016/bazzani2016wacv-self/) doi:10.1109/WACV.2016.7477688BibTeX
@inproceedings{bazzani2016wacv-self,
title = {{Self-Taught Object Localization with Deep Networks}},
author = {Bazzani, Loris and Bergamo, Alessandro and Anguelov, Dragomir and Torresani, Lorenzo},
booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision},
year = {2016},
pages = {1-9},
doi = {10.1109/WACV.2016.7477688},
url = {https://mlanthology.org/wacv/2016/bazzani2016wacv-self/}
}