Zhang, Zhuo

7 publications

AAAI 2025 Correcting Large Language Model Behavior via Influence Function Han Zhang, Zhuo Zhang, Yi Zhang, Yuanzhao Zhai, Hanyang Peng, Yu Lei, Yue Yu, Hui Wang, Bin Liang, Lin Gui, Ruifeng Xu
NeurIPS 2024 BiScope: AI-Generated Text Detection by Checking Memorization of Preceding Tokens Hanxi Guo, Siyuan Cheng, Xiaolong Jin, Zhuo Zhang, Kaiyuan Zhang, Guanhong Tao, Guangyu Shen, Xiangyu Zhang
NeurIPS 2024 Detecting Bugs with Substantial Monetary Consequences by LLM and Rule-Based Reasoning Brian Zhang, Zhuo Zhang
NeurIPSW 2024 MultiVerse: Exposing Large Language Model Alignment Problems in Diverse Worlds Xiaolong Jin, Zhuo Zhang, Guangyu Shen, Hanxi Guo, Kaiyuan Zhang, Siyuan Cheng, Xiangyu Zhang
NeurIPSW 2024 SkewAct: Red Teaming Large Language Models via Activation-Skewed Adversarial Prompt Optimization Hanxi Guo, Siyuan Cheng, Guanhong Tao, Guangyu Shen, Zhuo Zhang, Shengwei An, Kaiyuan Zhang, Xiangyu Zhang
NeurIPS 2023 ParaFuzz: An Interpretability-Driven Technique for Detecting Poisoned Samples in NLP Lu Yan, Zhuo Zhang, Guanhong Tao, Kaiyuan Zhang, Xuan Chen, Guangyu Shen, Xiangyu Zhang
ICML 2022 Constrained Optimization with Dynamic Bound-Scaling for Effective NLP Backdoor Defense Guangyu Shen, Yingqi Liu, Guanhong Tao, Qiuling Xu, Zhuo Zhang, Shengwei An, Shiqing Ma, Xiangyu Zhang