IntentBreaker: Intent-Adaptive Jailbreak Attack on Large Language Models

Abstract

Recent research on jailbreak attacks has uncovered substantial robustness vulnerabilities in existing large language models (LLMs), enabling attackers to bypass safety guardrails through carefully crafted malicious prompts. Such prompts can induce the generation of harmful content, posing significant safety and ethical concerns. In this paper, we reveal that the difficulty of successfully jailbreaking LLMs varies considerably depending on the intent of the attacker, which inherently limits the overall attack success rate (ASR). Current approaches mostly rely on generic jailbreak templates and optimization strategies, and this lack of adaptability limits their effectiveness and efficiency across diverse jailbreak intents. To address this limitation, we introduce IntentBreaker , a novel intent-adaptive jailbreak framework built on a hybrid evolutionary algorithm. Our approach categorizes malicious prompts into nine distinct intents and incorporates three adaptive improvements: template initialization, lexicons-based fitness function, and dynamic mutation operations, which are designed to align generated outputs more closely with the attack intent. Comprehensive experimental evaluations demonstrate that IntentBreaker achieves an average ASR of 98.61% across five open-source LLMs, outperforming baseline methods by 42.25%.

Cite

Text

Guo et al. "IntentBreaker: Intent-Adaptive Jailbreak Attack on Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025. doi:10.1007/978-3-032-06078-5_14

Markdown

[Guo et al. "IntentBreaker: Intent-Adaptive Jailbreak Attack on Large Language Models." European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, 2025.](https://mlanthology.org/ecmlpkdd/2025/guo2025ecmlpkdd-intentbreaker/) doi:10.1007/978-3-032-06078-5_14

BibTeX

@inproceedings{guo2025ecmlpkdd-intentbreaker,
  title     = {{IntentBreaker: Intent-Adaptive Jailbreak Attack on Large Language Models}},
  author    = {Guo, Shengnan and Zhai, Yuchen and Zhang, Shenyi and Zhao, Lingchen and Wang, Zhangyi},
  booktitle = {European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  year      = {2025},
  pages     = {240-256},
  doi       = {10.1007/978-3-032-06078-5_14},
  url       = {https://mlanthology.org/ecmlpkdd/2025/guo2025ecmlpkdd-intentbreaker/}
}