Zhou, Andy

18 publications

ICLR 2025 AIR-BENCH 2024: A Safety Benchmark Based on Regulation and Policies Specified Risk Categories Yi Zeng, Yu Yang, Andy Zhou, Jeffrey Ziwei Tan, Yuheng Tu, Yifan Mai, Kevin Klyman, Minzhou Pan, Ruoxi Jia, Dawn Song, Percy Liang, Bo Li
NeurIPS 2025 AutoRedTeamer: Autonomous Red Teaming with Lifelong Attack Integration Andy Zhou, Kevin Wu, Francesco Pinto, Zhaorun Chen, Yi Zeng, Yu Yang, Shuang Yang, Sanmi Koyejo, James Zou, Bo Li
ICLRW 2025 Compositional Subspace Representation Fine-Tuning for Adaptive Large Language Models Andy Zhou, Ron Arel
ICLR 2025 MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models Chejian Xu, Jiawei Zhang, Zhaorun Chen, Chulin Xie, Mintong Kang, Yujin Potter, Zhun Wang, Zhuowen Yuan, Alexander Xiong, Zidi Xiong, Chenhui Zhang, Lingzhi Yuan, Yi Zeng, Peiyang Xu, Chengquan Guo, Andy Zhou, Jeffrey Ziwei Tan, Xuandong Zhao, Francesco Pinto, Zhen Xiang, Yu Gai, Zinan Lin, Dan Hendrycks, Bo Li, Dawn Song
ICLRW 2025 Siege: Multi-Turn Jailbreaking of Large Language Models with Tree Search Andy Zhou, Ron Arel
ICLR 2025 Tamper-Resistant Safeguards for Open-Weight LLMs Rishub Tamirisa, Bhrugu Bharathi, Long Phan, Andy Zhou, Alice Gatti, Tarun Suresh, Maxwell Lin, Justin Wang, Rowan Wang, Ron Arel, Andy Zou, Dawn Song, Bo Li, Dan Hendrycks, Mantas Mazeika
CVPR 2024 FedSelect: Personalized Federated Learning with Customized Selection of Parameters for Fine-Tuning Rishub Tamirisa, Chulin Xie, Wenxuan Bao, Andy Zhou, Ron Arel, Aviv Shamsian
ICLRW 2024 GUARD: Role-Playing to Generate Natural-Language Jailbreakings to Test Guideline Adherence of Large Language Models Haibo Jin, Ruoxi Chen, Andy Zhou, Yang Zhang, Haohan Wang
NeurIPS 2024 Jailbreaking Large Language Models Against Moderation Guardrails via Cipher Characters Haibo Jin, Andy Zhou, Joe D. Menke, Haohan Wang
ICLRW 2024 Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
ICML 2024 Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models Andy Zhou, Kai Yan, Michal Shlapentokh-Rothman, Haohan Wang, Yu-Xiong Wang
NeurIPS 2024 RedCode: Risky Code Execution and Generation Benchmark for Code Agents Chengquan Guo, Xun Liu, Chulin Xie, Andy Zhou, Yi Zeng, Zinan Lin, Dawn Song, Bo Li
NeurIPS 2024 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou, Bo Li, Haohan Wang
ICLRW 2024 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou, Bo Li, Haohan Wang
ICCV 2023 A Sentence Speaks a Thousand Images: Domain Generalization Through Distilling CLIP with Language Guidance Zeyi Huang, Andy Zhou, Zijian Ling, Mu Cai, Haohan Wang, Yong Jae Lee
NeurIPS 2023 Distilling Out-of-Distribution Robustness from Vision-Language Foundation Models Andy Zhou, Jindong Wang, Yu-Xiong Wang, Haohan Wang
ICMLW 2023 FedSelect: Customized Selection of Parameters for Fine-Tuning During Personalized Federated Learning Rishub Tamirisa, John Won, Chengjun Lu, Ron Arel, Andy Zhou
NeurIPS 2023 YouTubePD: A Multimodal Benchmark for Parkinson’s Disease Analysis Andy Zhou, Samuel Li, Pranav Sriram, Xiang Li, Jiahua Dong, Ansh Sharma, Yuanyi Zhong, Shirui Luo, Volodymyr Kindratenko, George Heintz, Christopher Zallek, Yu-Xiong Wang