Two-Stage Training for Improved Classification of Poorly Localized Object Images
Abstract
State-of-the-art object classifiers finetuned from a pretrained (e.g. from ImageNet) model on a domain-specific dataset can accurately classify well-localized object images. However, such classifiers often fail on poorly localized images (images with lots of context, heavily occluded/partially visible, and off-centered objects). In this paper, we propose a two-stage training scheme to improve the classification of such noisy detections, often produced by low-compute algorithms such as motion based background removal techniques that run on the edge. The proposed two-stage training pipeline first trains a classifier from scratch with extreme image augmentation, followed by finetuning in the second stage. The first stage incorporates a lot of contextual information around the objects, given access to the corresponding full images. This stage works very well for classification of poorly localized input images, but generates a lot of false positives by classifying non-object images as objects. To reduce the false positives, a second training is done on the tight ground-truth bounding boxes (as done traditionally) by using the trained model in the first stage as the initial model and very slowly adjusting its weights during the training. To demonstrate the efficacy of our approach, we curated a new classification dataset for poorly localized images - noisy PASCAL VOC 2007 test dataset. Using this dataset, we show that the proposed two-stage training scheme can significantly improve the accuracy of the trained classifier on both well-localized and poorly-localized object images.
Cite
Text
Bondugula et al. "Two-Stage Training for Improved Classification of Poorly Localized Object Images." European Conference on Computer Vision Workshops, 2020. doi:10.1007/978-3-030-68238-5_18Markdown
[Bondugula et al. "Two-Stage Training for Improved Classification of Poorly Localized Object Images." European Conference on Computer Vision Workshops, 2020.](https://mlanthology.org/eccvw/2020/bondugula2020eccvw-twostage/) doi:10.1007/978-3-030-68238-5_18BibTeX
@inproceedings{bondugula2020eccvw-twostage,
title = {{Two-Stage Training for Improved Classification of Poorly Localized Object Images}},
author = {Bondugula, Sravanthi and Qian, Gang and Beach, Allison},
booktitle = {European Conference on Computer Vision Workshops},
year = {2020},
pages = {245-260},
doi = {10.1007/978-3-030-68238-5_18},
url = {https://mlanthology.org/eccvw/2020/bondugula2020eccvw-twostage/}
}