Soft Prompting Might Be a Bug, Not a Feature

Abstract

Prompt tuning, or "soft prompting," replaces text prompts to generative models with learned embeddings (i.e. vectors) and is used as an alternative to parameter-efficient fine-tuning. Prior work suggests analyzing soft prompts by interpreting them as natural language prompts. However, we find that soft prompts occupy regions in the embedding space that are distinct from those containing natural language, meaning that direct comparisons may be misleading. We argue that because soft prompts are currently uninterpretable, they could potentially be a source of vulnerability of LLMs to malicious manipulations during deployment.

Cite

Text

Bailey et al. "Soft Prompting Might Be a Bug, Not a Feature." ICML 2023 Workshops: DeployableGenerativeAI, 2023.

Markdown

[Bailey et al. "Soft Prompting Might Be a Bug, Not a Feature." ICML 2023 Workshops: DeployableGenerativeAI, 2023.](https://mlanthology.org/icmlw/2023/bailey2023icmlw-soft/)

BibTeX

@inproceedings{bailey2023icmlw-soft,
  title     = {{Soft Prompting Might Be a Bug, Not a Feature}},
  author    = {Bailey, Luke and Ahdritz, Gustaf and Kleiman, Anat and Swaroop, Siddharth and Doshi-Velez, Finale and Pan, Weiwei},
  booktitle = {ICML 2023 Workshops: DeployableGenerativeAI},
  year      = {2023},
  url       = {https://mlanthology.org/icmlw/2023/bailey2023icmlw-soft/}
}