InstaGen: Enhancing Object Detection by Training on Synthetic Dataset

Abstract

In this paper we present a novel paradigm to enhance the ability of object detector e.g. expanding categories or improving detection performance by training on syn- thetic dataset generated from diffusion models. Specifically we integrate an instance-level grounding head into a pre- trained generative diffusion model to augment it with the ability of localising instances in the generated images. The grounding head is trained to align the text embedding of category names with the regional visual feature of the diffusion model using supervision from an off-the-shelf object detector and a novel self-training scheme on (novel) categories not covered by the detector. We conduct thorough experiments to show that this enhanced version of diffusion model termed as InstaGen can serve as a data synthesizer to enhance object detectors by training on its generated samples demonstrating superior performance over existing state-of-the-art methods in open-vocabulary (+4.5 AP) and data-sparse (+1.2 ? 5.2 AP) scenarios.

Cite

Text

Feng et al. "InstaGen: Enhancing Object Detection by Training on Synthetic Dataset." Conference on Computer Vision and Pattern Recognition, 2024. doi:10.1109/CVPR52733.2024.01339

Markdown

[Feng et al. "InstaGen: Enhancing Object Detection by Training on Synthetic Dataset." Conference on Computer Vision and Pattern Recognition, 2024.](https://mlanthology.org/cvpr/2024/feng2024cvpr-instagen/) doi:10.1109/CVPR52733.2024.01339

BibTeX

@inproceedings{feng2024cvpr-instagen,
  title     = {{InstaGen: Enhancing Object Detection by Training on Synthetic Dataset}},
  author    = {Feng, Chengjian and Zhong, Yujie and Jie, Zequn and Xie, Weidi and Ma, Lin},
  booktitle = {Conference on Computer Vision and Pattern Recognition},
  year      = {2024},
  pages     = {14121-14130},
  doi       = {10.1109/CVPR52733.2024.01339},
  url       = {https://mlanthology.org/cvpr/2024/feng2024cvpr-instagen/}
}