GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Abstract

LLMs have demonstrated remarkable capabilities but remain highly susceptible to adversarial prompts despite extensive efforts for safety alignment, raising serious security concerns for their real-world adoptions. Existing jailbreak attacks rely on manual heuristics or computationally expensive optimization techniques, both struggling with generalization and efficiency. In this paper, we introduce GASP, a novel black-box attack framework that leverages latent Bayesian optimization to generate human-readable adversarial suffixes. Unlike prior methods, GASP efficiently explores continuous embedding spaces, optimizing for strong adversarial suffixes while preserving prompt coherence. We evaluate our method across multiple LLMs, showing its ability to produce natural and effective jailbreak prompts. Compared with alternatives, GASP significantly improves attack success rates and reduces computation costs, offering a scalable approach for red-teaming LLMs.

Cite

Text

Basani and Zhang. "GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.

Markdown

[Basani and Zhang. "GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs." ICLR 2025 Workshops: BuildingTrust, 2025.](https://mlanthology.org/iclrw/2025/basani2025iclrw-gasp/)

BibTeX

@inproceedings{basani2025iclrw-gasp,
  title     = {{GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs}},
  author    = {Basani, Advik Raj and Zhang, Xiao},
  booktitle = {ICLR 2025 Workshops: BuildingTrust},
  year      = {2025},
  url       = {https://mlanthology.org/iclrw/2025/basani2025iclrw-gasp/}
}