ML Anthology
Authors
Search
About
Hu, Chengzhi
1 publications
ICLR
2025
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
,
Chengzhi Hu
,
Paul Röttger
,
Barbara Plank