He, Xuanli

7 publications

ICLR 2025 An Auditing Test to Detect Behavioral Shift in Language Models Leo Richter, Xuanli He, Pasquale Minervini, Matt Kusner
ICMLW 2024 An Auditing Test to Detect Behavioral Shift in Language Models Leo Richter, Nitin Agrawal, Xuanli He, Pasquale Minervini, Matt Kusner
NeurIPSW 2024 Analysing the Residual Stream of Language Models Under Knowledge Conflicts Yu Zhao, Xiaotang Du, Giwon Hong, Aryo Pradipta Gema, Alessio Devoto, Hongru Wang, Xuanli He, Kam-Fai Wong, Pasquale Minervini
ICLRW 2024 Attacks on Third-Party APIs of Large Language Models Wanru Zhao, Vidit Khazanchi, Haodi Xing, Xuanli He, Qiongkai Xu, Nicholas Donald Lane
TMLR 2024 Generative Models Are Self-Watermarked: Declaring Model Authentication Through Re-Generation Aditya Desu, Xuanli He, Qiongkai Xu, Wei Lu
NeurIPS 2022 CATER: Intellectual Property Protection on Text Generation APIs via Conditional Watermarks Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, Ruoxi Jia
AAAI 2022 Protecting Intellectual Property of Language Generation APIs with Lexical Watermark Xuanli He, Qiongkai Xu, Lingjuan Lyu, Fangzhao Wu, Chenguang Wang