R-FCN-3000 at 30fps: Decoupling Detection and Classification
Abstract
We propose a modular approach towards large-scale real-time object detection by decoupling objectness detection and classification. We exploit the fact that many object classes are visually similar and share parts. Thus, a universal objectness detector can be learned for class-agnostic object detection followed by fine-grained classification using a (non)linear classifier. Our approach is a modification of the R-FCN architecture to learn shared filters for performing localization across different object classes. We trained a detector for 3000 object classes, called R-FCN-3000, that obtains an mAP of 34.9% on the ImageNet detection dataset. It outperforms YOLO-9000 by 18% while processing 30 images per second. We also show that the objectness learned by R-FCN-3000 generalizes to novel classes and the performance increases with the number of training object classes - supporting the hypothesis that it is possible to learn a universal objectness detector.
Cite
Text
Singh et al. "R-FCN-3000 at 30fps: Decoupling Detection and Classification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018. doi:10.1109/CVPR.2018.00119Markdown
[Singh et al. "R-FCN-3000 at 30fps: Decoupling Detection and Classification." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.](https://mlanthology.org/cvpr/2018/singh2018cvpr-rfcn3000/) doi:10.1109/CVPR.2018.00119BibTeX
@inproceedings{singh2018cvpr-rfcn3000,
title = {{R-FCN-3000 at 30fps: Decoupling Detection and Classification}},
author = {Singh, Bharat and Li, Hengduo and Sharma, Abhishek and Davis, Larry S.},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year = {2018},
doi = {10.1109/CVPR.2018.00119},
url = {https://mlanthology.org/cvpr/2018/singh2018cvpr-rfcn3000/}
}