ML Anthology
Authors
Search
About
Hu, Xiaomeng
5 publications
NeurIPS
2025
CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention
Xiaomeng Hu
,
Fei Huang
,
Chenhan Yuan
,
Junyang Lin
,
Tsung-Yi Ho
AAAI
2025
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
Xiaomeng Hu
,
Pin-Yu Chen
,
Tsung-Yi Ho
NeurIPS
2024
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
Xiaomeng Hu
,
Pin-Yu Chen
,
Tsung-Yi Ho
NeurIPSW
2024
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models
Xiaomeng Hu
,
Pin-Yu Chen
,
Tsung-Yi Ho
NeurIPS
2023
RADAR: Robust AI-Text Detection via Adversarial Learning
Xiaomeng Hu
,
Pin-Yu Chen
,
Tsung-Yi Ho