Zhang, Jiaheng

9 publications

ICLRW 2025 Efficient Backdoor Detection on Text-to-Image Synthesis via Neuron Activation Variation Shengfang Zhai, Jiajun Li, Yue Liu, Yinpeng Dong, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Jiaheng Zhang
ICCV 2025 Efficient Input-Level Backdoor Defense on Text-to-Image Synthesis via Neuron Activation Variation Shengfang Zhai, Jiajun Li, Yue Liu, Huanran Chen, Zhihua Tian, Wenjie Qu, Qingni Shen, Ruoxi Jia, Yinpeng Dong, Jiaheng Zhang
ICML 2025 FlipAttack: Jailbreak LLMs via Flipping Yue Liu, Xiaoxin He, Miao Xiong, Jinlan Fu, Shumin Deng, Yingwei Ma, Jiaheng Zhang, Bryan Hooi
ICLRW 2025 FlipAttack: Jailbreak LLMs via Flipping Yue Liu, Xiaoxin He, Miao Xiong, Jinlan Fu, Shumin Deng, Yingwei Ma, Jiaheng Zhang, Bryan Hooi
ICLRW 2025 Geneshift: Impact of Different Scenario Shift on Jailbreaking LLM Tianyi Wu, Zhiwei Xue, Yue Liu, Jiaheng Zhang, Bryan Hooi, See-Kiong Ng
NeurIPS 2025 GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning Yue Liu, Shengfang Zhai, Mingzhe Du, Yulin Chen, Tri Cao, Hongcheng Gao, Cheng Wang, Xinfeng Li, Kun Wang, Junfeng Fang, Jiaheng Zhang, Bryan Hooi
ICLRW 2025 GuardReasoner: Towards Reasoning-Based LLM Safeguards Yue Liu, Hongcheng Gao, Shengfang Zhai, Jun Xia, Tianyi Wu, Zhiwei Xue, Yulin Chen, Kenji Kawaguchi, Jiaheng Zhang, Bryan Hooi
TMLR 2025 Reliable and Responsible Foundation Models Xinyu Yang, Junlin Han, Rishi Bommasani, Jinqi Luo, Wenjie Qu, Wangchunshu Zhou, Adel Bibi, Xiyao Wang, Jaehong Yoon, Elias Stengel-Eskin, Shengbang Tong, Lingfeng Shen, Rafael Rafailov, Runjia Li, Zhaoyang Wang, Yiyang Zhou, Chenhang Cui, Yu Wang, Wenhao Zheng, Huichi Zhou, Jindong Gu, Zhaorun Chen, Peng Xia, Tony Lee, Thomas P Zollo, Vikash Sehwag, Jixuan Leng, Jiuhai Chen, Yuxin Wen, Huan Zhang, Zhun Deng, Linjun Zhang, Pavel Izmailov, Pang Wei Koh, Yulia Tsvetkov, Andrew Gordon Wilson, Jiaheng Zhang, James Zou, Cihang Xie, Hao Wang, Philip Torr, Julian McAuley, David Alvarez-Melis, Florian Tramèr, Kaidi Xu, Suman Jana, Chris Callison-Burch, Rene Vidal, Filippos Kokkinos, Mohit Bansal, Beidi Chen, Huaxiu Yao
ICLRW 2024 PANDORA: Detailed LLM Jailbreaking via Collaborated Phishing Agents with Decomposed Reasoning Zhaorun Chen, Zhuokai Zhao, Wenjie Qu, Zichen Wen, Zhiguang Han, Zhihong Zhu, Jiaheng Zhang, Huaxiu Yao