Huang, Tiansheng
13 publications
ICLR
2025
Booster: Tackling Harmful Fine-Tuning for Large Language Models via Attenuating Harmful Perturbation
NeurIPS
2025
Panacea: Mitigating Harmful Fine-Tuning for Large Language Models via Post-Fine-Tuning Perturbation
NeurIPS
2024
Lisa: Lazy Safety Alignment for Large Language Models Against Harmful Fine-Tuning Attack
NeurIPS
2024
Vaccine: Perturbation-Aware Alignment for Large Language Models Against Harmful Fine-Tuning Attack