Zhang, Jingfeng
40 publications
ICML
2025
One Stone, Two Birds: Enhancing Adversarial Defense Through the Lens of Distributional Discrepancy
NeurIPS
2025
Short-Length Adversarial Training Helps LLMs Defend Long-Length Jailbreak Attacks: Theoretical and Empirical Evidence
NeurIPS
2022
Adversarial Training with Complementary Labels: On the Benefit of Gradually Informative Attacks
NeurIPSW
2022
Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning