Qi, Xiangyu

17 publications

ICLR 2025 On Evaluating the Durability of Safeguards for Open-Weight LLMs Xiangyu Qi, Boyi Wei, Nicholas Carlini, Yangsibo Huang, Tinghao Xie, Luxi He, Matthew Jagielski, Milad Nasr, Prateek Mittal, Peter Henderson
ICLR 2025 SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal Tinghao Xie, Xiangyu Qi, Yi Zeng, Yangsibo Huang, Udari Madhushani Sehwag, Kaixuan Huang, Luxi He, Boyi Wei, Dacheng Li, Ying Sheng, Ruoxi Jia, Bo Li, Kai Li, Danqi Chen, Peter Henderson, Prateek Mittal
ICLR 2025 Safety Alignment Should Be Made More than Just a Few Tokens Deep Xiangyu Qi, Ashwinee Panda, Kaifeng Lyu, Xiao Ma, Subhrajit Roy, Ahmad Beirami, Prateek Mittal, Peter Henderson
ICML 2024 Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson
ICLRW 2024 Assessing the Brittleness of Safety Alignment via Pruning and Low-Rank Modifications Boyi Wei, Kaixuan Huang, Yangsibo Huang, Tinghao Xie, Xiangyu Qi, Mengzhou Xia, Prateek Mittal, Mengdi Wang, Peter Henderson
ICLR 2024 BaDExpert: Extracting Backdoor Functionality for Accurate Backdoor Input Detection Tinghao Xie, Xiangyu Qi, Ping He, Yiming Li, Jiachen T. Wang, Prateek Mittal
NeurIPS 2024 BackdoorAlign: Mitigating Fine-Tuning Based Jailbreak Attack with Backdoor Enhanced Safety Alignment Jiongxiao Wang, Jiazhao Li, Yiquan Li, Xiangyu Qi, Junjie Hu, Yixuan Li, Patrick McDaniel, Muhao Chen, Bo Li, Chaowei Xiao
ICLR 2024 Fine-Tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend to! Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal, Peter Henderson
ICMLW 2024 Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal
ICMLW 2024 Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal
ICMLW 2024 Lottery Ticket Adaptation: Mitigating Destructive Interference in LLMs Ashwinee Panda, Berivan Isik, Xiangyu Qi, Sanmi Koyejo, Tsachy Weissman, Prateek Mittal
AAAI 2024 Visual Adversarial Examples Jailbreak Aligned Large Language Models Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Peter Henderson, Mengdi Wang, Prateek Mittal
ICLR 2023 Revisiting the Assumption of Latent Separability for Backdoor Defenses Xiangyu Qi, Tinghao Xie, Yiming Li, Saeed Mahloujifar, Prateek Mittal
ICML 2023 Uncovering Adversarial Risks of Test-Time Adaptation Tong Wu, Feiran Jia, Xiangyu Qi, Jiachen T. Wang, Vikash Sehwag, Saeed Mahloujifar, Prateek Mittal
ICMLW 2023 Visual Adversarial Examples Jailbreak Aligned Large Language Models Xiangyu Qi, Kaixuan Huang, Ashwinee Panda, Mengdi Wang, Prateek Mittal
CVPR 2022 Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks Xiangyu Qi, Tinghao Xie, Ruizhe Pan, Jifeng Zhu, Yong Yang, Kai Bu
ICML 2021 Knowledge Enhanced Machine Learning Pipeline Against Diverse Adversarial Attacks Nezihe Merve Gürel, Xiangyu Qi, Luka Rimanic, Ce Zhang, Bo Li