Hu, Xiaomeng

5 publications

NeurIPS 2025 CARE: Decoding-Time Safety Alignment via Rollback and Introspection Intervention Xiaomeng Hu, Fei Huang, Chenhan Yuan, Junyang Lin, Tsung-Yi Ho
AAAI 2025 Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
NeurIPS 2024 Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
NeurIPSW 2024 Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
NeurIPS 2023 RADAR: Robust AI-Text Detection via Adversarial Learning Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho