ML Anthology
Authors
Search
About
Zeng, Yifan
3 publications
ICLR
2025
A Common Pitfall of Margin-Based Language Model Alignment: Gradient Entanglement
Hui Yuan
,
Yifan Zeng
,
Yue Wu
,
Huazheng Wang
,
Mengdi Wang
,
Liu Leqi
NeurIPSW
2024
A Common Pitfall of Margin-Based Language Model Alignment: Gradient Entanglement
Hui Yuan
,
Yifan Zeng
,
Yue Wu
,
Huazheng Wang
,
Mengdi Wang
,
Liu Leqi
NeurIPSW
2024
AutoDefense: Multi-Agent LLM Defense Against Jailbreak Attacks
Yifan Zeng
,
Yiran Wu
,
Xiao Zhang
,
Huazheng Wang
,
Qingyun Wu