SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models
Abstract
As Large Language Models (LLMs) continue to advance in capability and influence, ensuring their security and preventing harmful outputs has become crucial. A promising approach to address these concerns involves training models to automatically generate adversarial prompts for red teaming. However, the evolving subtlety of vulnerabilities in LLMs challenges the effectiveness of current adversarial methods, which struggle to generate diverse, complex prompts and dynamically explore the weaknesses of these models. To tackle these challenges, we introduce the Self-Evolving Adversarial Safety (SEAS) optimization framework, which includes both a SEAS dataset and a SEAS pipeline. The SEAS dataset comprises complex adversarial prompts, while the SEAS pipeline operates through three stages: Initialization, Attack, and Adversarial Optimization. This framework generates a diverse range of adversarial prompts and dynamically explores the model's vulnerabilities to enhance its security. Our contributions include a novel adversarial framework, a comprehensive safety dataset, and empirical evidence demonstrating the effectiveness of SEAS.
Cite
Text
Diao et al. "SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models." AAAI Conference on Artificial Intelligence, 2025. doi:10.1609/AAAI.V39I22.34549Markdown
[Diao et al. "SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models." AAAI Conference on Artificial Intelligence, 2025.](https://mlanthology.org/aaai/2025/diao2025aaai-seas/) doi:10.1609/AAAI.V39I22.34549BibTeX
@inproceedings{diao2025aaai-seas,
title = {{SEAS: Self-Evolving Adversarial Safety Optimization for Large Language Models}},
author = {Diao, Muxi and Li, Rumei and Liu, Shiyang and Liao, Guogang and Wang, Jingang and Cai, Xunliang and Xu, Weiran},
booktitle = {AAAI Conference on Artificial Intelligence},
year = {2025},
pages = {23778-23786},
doi = {10.1609/AAAI.V39I22.34549},
url = {https://mlanthology.org/aaai/2025/diao2025aaai-seas/}
}