Huang, Ruixuan

1 publications

NeurIPS 2024 Uncovering Safety Risks of Large Language Models Through Concept Activation Vector Zhihao Xu, Ruixuan Huang, Changyu Chen, Xiting Wang