Dabas, Mahavir

1 publications

ICML 2025 Just Enough Shifts: Mitigating Over-Refusal in Aligned Language Models with Targeted Representation Fine-Tuning Mahavir Dabas, Si Chen, Charles Fleming, Ming Jin, Ruoxi Jia