Ho, Tsung-Yi
36 publications
NeurIPS
2024
GREAT Score: Global Robustness Evaluation of Adversarial Perturbation Using Generative Models
NeurIPS
2024
Gradient Cuff: Detecting Jailbreak Attacks on Large Language Models by Exploring Refusal Loss Landscapes
TMLR
2024
Neural Clamping: Joint Input Perturbation and Temperature Scaling for Neural Network Calibration
NeurIPS
2024
NeuralFuse: Learning to Recover the Accuracy of Access-Limited Neural Network Inference in Low-Voltage Regimes
NeurIPSW
2024
Token Highlighter: Inspecting and Mitigating Jailbreak Prompts for Large Language Models