Patch-Level Augmentation for Object Detection in Aerial Images

Abstract

Object detection in specific views (e.g., top view, road view, and aerial view) suffers from a lack of dataset, which causes class imbalance and difficulties of covering hard examples. In order to handle these issues, we propose a hard chip mining method that makes the ratio of each class balanced and generates hard examples that are efficient for model training. First, we generate multi-scale chips to train object detector. Next, we extract object patches from the dataset to construct an object pool; then those patches are used to augment the dataset. By this augmentation, we can overcome the class imbalance problem. After that, we perform inference with the trained detector on augmented images, then generate hard chips from misclassified regions. Finally, we train the final detector by both normal and hard chips. The proposed method achieves superior results on VisDrone dataset both qualitatively and quantitatively. Also, our model is ranked 3rd in VisDrone-DET2019 challenge (http://aiskyeye.com/).

Cite

Text

Hong et al. "Patch-Level Augmentation for Object Detection in Aerial Images." IEEE/CVF International Conference on Computer Vision Workshops, 2019. doi:10.1109/ICCVW.2019.00021

Markdown

[Hong et al. "Patch-Level Augmentation for Object Detection in Aerial Images." IEEE/CVF International Conference on Computer Vision Workshops, 2019.](https://mlanthology.org/iccvw/2019/hong2019iccvw-patchlevel/) doi:10.1109/ICCVW.2019.00021

BibTeX

@inproceedings{hong2019iccvw-patchlevel,
  title     = {{Patch-Level Augmentation for Object Detection in Aerial Images}},
  author    = {Hong, Sungeun and Kang, Sungil and Cho, Donghyeon},
  booktitle = {IEEE/CVF International Conference on Computer Vision Workshops},
  year      = {2019},
  pages     = {127-134},
  doi       = {10.1109/ICCVW.2019.00021},
  url       = {https://mlanthology.org/iccvw/2019/hong2019iccvw-patchlevel/}
}