Cheaper Pre-Training Lunch: An Efficient Paradigm for Object Detection

Abstract

In this paper, we propose a general and efficient pre-training paradigm, Montage pre-training, for object detection. Montage pre-training needs only the target detection dataset while taking only 1/4 computational resources compared to the widely adopted ImageNet pre-training. To build such an efficient paradigm, we reduce the potential redundancy by carefully extracting useful samples from the original images, assembling samples in a Montage manner as input, and using an ERF-adaptive dense classification strategy for model pre-training. These designs include not only a new input pattern to improve the spatial utilization but also a novel learning objective to expand the effective receptive field of the pre-trained model. The efficiency and effectiveness of Montage pre-training are validated by extensive experiments on the MS-COCO dataset, where the results indicate that the models using Montage pre-training are able to achieve on-par or even better detection performances compared with the ImageNet pre-training.

Cite

Text

Zhou et al. "Cheaper Pre-Training Lunch: An Efficient Paradigm for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2020. doi:10.1007/978-3-030-58598-3_16

Markdown

[Zhou et al. "Cheaper Pre-Training Lunch: An Efficient Paradigm for Object Detection." Proceedings of the European Conference on Computer Vision (ECCV), 2020.](https://mlanthology.org/eccv/2020/zhou2020eccv-cheaper/) doi:10.1007/978-3-030-58598-3_16

BibTeX

@inproceedings{zhou2020eccv-cheaper,
  title     = {{Cheaper Pre-Training Lunch: An Efficient Paradigm for Object Detection}},
  author    = {Zhou, Dongzhan and Zhou, Xinchi and Zhang, Hongwen and Yi, Shuai and Ouyang, Wanli},
  booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
  year      = {2020},
  doi       = {10.1007/978-3-030-58598-3_16},
  url       = {https://mlanthology.org/eccv/2020/zhou2020eccv-cheaper/}
}