Visual Prompt Tuning for Generative Transfer Learning
Abstract
Learning generative image models from various domains efficiently needs transferring knowledge from an image synthesis model trained on a large dataset. We present a recipe for learning vision transformers by generative knowledge transfer. We base our framework on generative vision transformers representing an image as a sequence of visual tokens with the autoregressive or non-autoregressive transformers. To adapt to a new domain, we employ prompt tuning, which prepends learnable tokens called prompts to the image token sequence and introduces a new prompt design for our task. We study on a variety of visual domains with varying amounts of training images. We show the effectiveness of knowledge transfer and a significantly better image generation quality. Code is available at https://github.com/google-research/generative_transfer.
Cite
Text
Sohn et al. "Visual Prompt Tuning for Generative Transfer Learning." Conference on Computer Vision and Pattern Recognition, 2023. doi:10.1109/CVPR52729.2023.01900Markdown
[Sohn et al. "Visual Prompt Tuning for Generative Transfer Learning." Conference on Computer Vision and Pattern Recognition, 2023.](https://mlanthology.org/cvpr/2023/sohn2023cvpr-visual/) doi:10.1109/CVPR52729.2023.01900BibTeX
@inproceedings{sohn2023cvpr-visual,
title = {{Visual Prompt Tuning for Generative Transfer Learning}},
author = {Sohn, Kihyuk and Chang, Huiwen and Lezama, José and Polania, Luisa and Zhang, Han and Hao, Yuan and Essa, Irfan and Jiang, Lu},
booktitle = {Conference on Computer Vision and Pattern Recognition},
year = {2023},
pages = {19840-19851},
doi = {10.1109/CVPR52729.2023.01900},
url = {https://mlanthology.org/cvpr/2023/sohn2023cvpr-visual/}
}