Zhu, Sicheng

18 publications

NeurIPS 2025 AdvPrefix: An Objective for Nuanced LLM Jailbreaks Sicheng Zhu, Brandon Amos, Yuandong Tian, Chuan Guo, Ivan Evtimov
AAAI 2025 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
ICLR 2025 GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment Yuancheng Xu, Udari Madhushani Sehwag, Alec Koppel, Sicheng Zhu, Bang An, Furong Huang, Sumitra Ganesh
ICMLW 2024 Automatic Pseudo-Harmful Prompt Generation for Evaluating False Refusals in Large Language Models Bang An, Sicheng Zhu, Ruiyi Zhang, Michael-Andrei Panaitescu-Liess, Yuancheng Xu, Furong Huang
ICMLW 2024 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
NeurIPSW 2024 Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? Michael-Andrei Panaitescu-Liess, Zora Che, Bang An, Yuancheng Xu, Pankayaraj Pathmanathan, Souradip Chakraborty, Sicheng Zhu, Tom Goldstein, Furong Huang
ICLR 2024 Like Oil and Water: Group Robustness Methods and Poisoning Defenses May Be at Odds Michael-Andrei Panaitescu-Liess, Yigitcan Kaya, Sicheng Zhu, Furong Huang, Tudor Dumitras
ICLR 2024 PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
NeurIPSW 2024 PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models Michael-Andrei Panaitescu-Liess, Pankayaraj Pathmanathan, Yigitcan Kaya, Zora Che, Bang An, Sicheng Zhu, Aakriti Agrawal, Furong Huang
ICML 2024 Position: On the Possibilities of AI-Generated Text Detection Souradip Chakraborty, Amrit Bedi, Sicheng Zhu, Bang An, Dinesh Manocha, Furong Huang
ICML 2024 WAVES: Benchmarking the Robustness of Image Watermarks Bang An, Mucong Ding, Tahseen Rabbani, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
ICLRW 2024 WAVES: Benchmarking the Robustness of Image Watermarks Mucong Ding, Tahseen Rabbani, Bang An, Aakriti Agrawal, Yuancheng Xu, Chenghao Deng, Sicheng Zhu, Abdirisak Mohamed, Yuxin Wen, Tom Goldstein, Furong Huang
NeurIPSW 2023 AutoDAN: Automatic and Interpretable Adversarial Attacks on Large Language Models Sicheng Zhu, Ruiyi Zhang, Bang An, Gang Wu, Joe Barrow, Zichao Wang, Furong Huang, Ani Nenkova, Tong Sun
ICML 2023 Learning Unforeseen Robustness from Out-of-Distribution Data Using Equivariant Domain Translator Sicheng Zhu, Bang An, Furong Huang, Sanghyun Hong
ICLRW 2023 Learning Unforeseen Robustness from Out-of-Distribution Data Using Equivariant Domain Translator Sicheng Zhu, Bang An, Furong Huang, Sanghyun Hong
ICMLW 2023 More Context, Less Distraction: Improving Zero-Shot Inference of CLIP by Inferring and Describing Spurious Features Bang An, Sicheng Zhu, Michael-Andrei Panaitescu-Liess, Chaithanya Kumar Mummadi, Furong Huang
NeurIPS 2021 Understanding the Generalization Benefit of Model Invariance from a Data Perspective Sicheng Zhu, Bang An, Furong Huang
ICML 2020 Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization Sicheng Zhu, Xiao Zhang, David Evans