Backes, Michael

29 publications

NeurIPS 2025 Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
ICLR 2025 Captured by Captions: On Memorization and Its Mitigation in CLIP Models Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch
ICLRW 2025 Captured by Captions: On Memorization and Its Mitigation in CLIP Models Wenhao Wang, Adam Dziedzic, Grace C. Kim, Michael Backes, Franziska Boenisch
ICML 2025 Efficient and Privacy-Preserving Soft Prompt Transfer for LLMs Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Christopher A. Choquette-Choo, Adam Dziedzic
NeurIPS 2025 Finding and Reactivating Post-Trained LLMs' Hidden Safety Mechanisms Mingjie Li, Wai Man Si, Michael Backes, Yang Zhang, Yisen Wang
ICCV 2025 Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang
ICML 2025 Provably Cost-Sensitive Adversarial Defense via Randomized Smoothing Yuan Xin, Dingfan Chen, Michael Backes, Xiao Zhang
ICLR 2025 SaLoRA: Safety-Alignment Preserved Low-Rank Adaptation Mingjie Li, Wai Man Si, Michael Backes, Yang Zhang, Yisen Wang
NeurIPSW 2024 Auditing Empirical Privacy Protection of Private LLM Adaptations Lorenzo Rossi, Bartłomiej Marek, Vincent Hanke, Xun Wang, Michael Backes, Adam Dziedzic, Franziska Boenisch
WACV 2024 Generated Distributions Are All You Need for Membership Inference Attacks Against Generative Models Minxing Zhang, Ning Yu, Rui Wen, Michael Backes, Yang Zhang
TMLR 2024 Generating Less Certain Adversarial Examples Improves Robust Generalization Minxing Zhang, Michael Backes, Xiao Zhang
NeurIPS 2024 Localizing Memorization in SSL Vision Encoders Wenhao Wang, Adam Dziedzic, Michael Backes, Franziska Boenisch
ICLR 2024 Memorization in Self-Supervised Learning Improves Downstream Generalization Wenhao Wang, Muhammad Ahmad Kaleem, Adam Dziedzic, Michael Backes, Nicolas Papernot, Franziska Boenisch
NeurIPS 2024 Open LLMs Are Necessary for Current Private Adaptations and Outperform Their Closed Alternatives Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola E. Olatunji, Michael Backes, Adam Dziedzic
ICMLW 2024 Open LLMs Are Necessary for Private Adaptations and Outperform Their Closed Alternatives Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola Emmanuel Olatunji, Michael Backes, Adam Dziedzic
ICMLW 2024 Open LLMs Are Necessary for Private Adaptations and Outperform Their Closed Alternatives Vincent Hanke, Tom Blanchard, Franziska Boenisch, Iyiola Emmanuel Olatunji, Michael Backes, Adam Dziedzic
ICMLW 2024 POST: A Framework for Privacy of Soft-Prompt Transfer Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Adam Dziedzic
ICMLW 2024 POST: A Framework for Privacy of Soft-Prompt Transfer Xun Wang, Jing Xu, Franziska Boenisch, Michael Backes, Adam Dziedzic
ICML 2024 Position: TrustLLM: Trustworthiness in Large Language Models Yue Huang, Lichao Sun, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Hanchi Sun, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric P. Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Yang Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, Yue Zhao
MLJ 2023 Adversarial Vulnerability Bounds for Gaussian Process Classification Michael Thomas Smith, Kathrin Grosse, Michael Backes, Mauricio A. Álvarez
CVPR 2023 Can't Steal? Cont-Steal! Contrastive Stealing Attacks Against Image Encoders Zeyang Sha, Xinlei He, Ning Yu, Michael Backes, Yang Zhang
ICML 2023 Data Poisoning Attacks Against Multimodal Encoders Ziqing Yang, Xinlei He, Zheng Li, Michael Backes, Mathias Humbert, Pascal Berrang, Yang Zhang
ICML 2023 Generated Graph Detection Yihan Ma, Zhikun Zhang, Ning Yu, Xinlei He, Michael Backes, Yun Shen, Yang Zhang
ICLR 2023 Is Adversarial Training Really a Silver Bullet for Mitigating Data Poisoning? Rui Wen, Zhengyu Zhao, Zhuoran Liu, Michael Backes, Tianhao Wang, Yang Zhang
ICMLW 2023 Provably Robust Cost-Sensitive Learning via Randomized Smoothing Yuan Xin, Michael Backes, Xiao Zhang
ICMLW 2021 BadNL: Backdoor Attacks Against NLP Models Xiaoyi Chen, Ahmed Salem, Michael Backes, Shiqing Ma, Yang Zhang
CVPRW 2021 MLCapsule: Guarded Offline Deployment of Machine Learning as a Service Lucjan Hanzlik, Yang Zhang, Kathrin Grosse, Ahmed Salem, Maximilian Augustin, Michael Backes, Mario Fritz
IJCAI 2019 Fairwalk: Towards Fair Graph Embedding Tahleen A. Rahman, Bartlomiej Surma, Michael Backes, Yang Zhang
AAAI 2018 Stackelberg Planning: Towards Effective Leader-Follower State Space Search Patrick Speicher, Marcel Steinmetz, Michael Backes, Jörg Hoffmann, Robert Künnemann