Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks

Zi Wang, Divyam Anshumaan, Ashish Hooda, Yudong Chen, Somesh Jha

ICLR 2025

/iclr/2025/wang2025iclr-functional/

Abstract

Optimization methods are widely employed in deep learning to address and mitigate undesired model responses. While gradient-based techniques have proven effective for image models, their application to language models is hindered by the discrete nature of the input space. This study introduces a novel optimization approach, termed the *functional homotopy* method, which leverages the functional duality between model training and input generation. By constructing a series of easy-to-hard optimization problems, we iteratively solve these using principles derived from established homotopy methods. We apply this approach to jailbreak attack synthesis for large language models (LLMs), achieving a 20%-30% improvement in success rate over existing methods in circumventing established safe open-source models such as Llama-2 and Llama-3.

PDF ICLR Semantic Scholar

Cite

Text

Wang et al. "Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks." International Conference on Learning Representations, 2025.

Markdown

[Wang et al. "Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks." International Conference on Learning Representations, 2025.](https://mlanthology.org/iclr/2025/wang2025iclr-functional/)

BibTeX

@inproceedings{wang2025iclr-functional,
  title     = {{Functional Homotopy: Smoothing Discrete Optimization via Continuous Parameters for LLM Jailbreak Attacks}},
  author    = {Wang, Zi and Anshumaan, Divyam and Hooda, Ashish and Chen, Yudong and Jha, Somesh},
  booktitle = {International Conference on Learning Representations},
  year      = {2025},
  url       = {https://mlanthology.org/iclr/2025/wang2025iclr-functional/}
}