Xie, Zhixin

1 publications

NeurIPS 2025 Attack via Overfitting: 10-Shot Benign Fine-Tuning to Jailbreak LLMs Zhixin Xie, Xurui Song, Jun Luo