Black Box Adversarial Prompting for Foundation Models

Maus, Natalie; Chao, Patrick; Wong, Eric; Gardner, Jacob R.

Black Box Adversarial Prompting for Foundation Models

Natalie Maus, Patrick Chao, Eric Wong, Jacob R. Gardner

ICMLW 2023

/icmlw/2023/maus2023icmlw-black/

Abstract

Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts for unstructured image and text generation. These prompts, which can be standalone or prepended to benign prompts, induce specific behaviors into the generative process, such as generating images of a particular object or generating high perplexity text.

PDF ICMLW OpenReview Semantic Scholar

Cite

Text

Maus et al. "Black Box Adversarial Prompting for Foundation Models." ICML 2023 Workshops: AdvML-Frontiers, 2023.

Markdown

[Maus et al. "Black Box Adversarial Prompting for Foundation Models." ICML 2023 Workshops: AdvML-Frontiers, 2023.](https://mlanthology.org/icmlw/2023/maus2023icmlw-black/)

BibTeX

@inproceedings{maus2023icmlw-black,
  title     = {{Black Box Adversarial Prompting for Foundation Models}},
  author    = {Maus, Natalie and Chao, Patrick and Wong, Eric and Gardner, Jacob R.},
  booktitle = {ICML 2023 Workshops: AdvML-Frontiers},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/maus2023icmlw-black/}
}