Hu, Chengzhi

1 publications

ICLR 2025 Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation Xinpeng Wang, Chengzhi Hu, Paul Röttger, Barbara Plank