ML Anthology
Authors
Search
About
Dabas, Mahavir
1 publications
ICML
2025
Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning
Mahavir Dabas
,
Si Chen
,
Charles Fleming
,
Ming Jin
,
Ruoxi Jia