Visual Prompting via Image Inpainting

Abstract

How does one adapt a pre-trained visual model to novel downstream tasks without task-specific finetuning or any model modification? Inspired by prompting in NLP, this paper investigates visual prompting: given input-output image example(s) of a new task at test time and a new input image, the goal is to automatically produce the output image, consistent with the given examples. We show that posing this problem as simple image inpainting -- literally just filling in a hole in a concatenated visual prompt image -- turns out to be surprisingly effective, provided that the inpainting algorithm has been trained on the right data. We train masked auto-encoders on a new dataset that we curated -- 88k unlabeled figures from academic papers sources on Arxiv. We apply visual prompting to these pretrained models and demonstrate results on various downstream image-to-image tasks, including foreground segmentation, single object detection, colorization, edge detection, etc. Project page: https://yossigandelsman.github.io/visual_prompt

Cite

Text

Bar et al. "Visual Prompting via Image Inpainting." Neural Information Processing Systems, 2022.

Markdown

[Bar et al. "Visual Prompting via Image Inpainting." Neural Information Processing Systems, 2022.](https://mlanthology.org/neurips/2022/bar2022neurips-visual/)

BibTeX

@inproceedings{bar2022neurips-visual,
  title     = {{Visual Prompting via Image Inpainting}},
  author    = {Bar, Amir and Gandelsman, Yossi and Darrell, Trevor and Globerson, Amir and Efros, Alexei},
  booktitle = {Neural Information Processing Systems},
  year      = {2022},
  url       = {https://mlanthology.org/neurips/2022/bar2022neurips-visual/}
}