ML Anthology
Authors
Search
About
Si, Wai Man
2 publications
NeurIPS
2025
Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang
ICLR
2025
SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation
Mingjie Li
,
Wai Man Si
,
Michael Backes
,
Yang Zhang
,
Yisen Wang