ML Anthology
Authors
Search
About
Huang, Ruixuan
1 publications
NeurIPS
2024
Uncovering Safety Risks of Large Language Models Through Concept Activation Vector
Zhihao Xu
,
Ruixuan Huang
,
Changyu Chen
,
Xiting Wang