ML Anthology
Authors
Search
About
Huang, Ruixuan
2 publications
ICLR
2026
GuidedBench: Measuring and Mitigating the Evaluation Discrepancies of In-the-Wild LLM Jailbreak Methods
Ruixuan Huang
,
Xunguang Wang
,
Zongjie Li
,
Daoyuan Wu
,
Shuai Wang
NeurIPS
2024
Uncovering Safety Risks of Large Language Models Through Concept Activation Vector
Zhihao Xu
,
Ruixuan Huang
,
Changyu Chen
,
Xiting Wang