Carlini, Nicholas

47 publications

ICLR 2025 Adversarial Perturbations Cannot Reliably Protect Artists from Generative AI Robert Hönig, Javier Rando, Nicholas Carlini, Florian Tramèr
ICML 2025 AutoAdvExBench: Benchmarking Autonomous Exploitation of Adversarial Example Defenses Nicholas Carlini, Edoardo Debenedetti, Javier Rando, Milad Nasr, Florian Tramèr
ICML 2025 Exploring and Mitigating Adversarial Manipulation of Voting-Based Leaderboards Yangsibo Huang, Milad Nasr, Anastasios Nikolas Angelopoulos, Nicholas Carlini, Wei-Lin Chiang, Christopher A. Choquette-Choo, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Ken Liu, Ion Stoica, Florian Tramèr, Chiyuan Zhang
NeurIPS 2025 IF-Guide: Influence Function-Guided Detoxification of LLMs Zachary Coalson, Juhan Bae, Nicholas Carlini, Sanghyun Hong
ICLR 2025 Measuring Non-Adversarial Reproduction of Training Data in Large Language Models Michael Aerni, Javier Rando, Edoardo Debenedetti, Nicholas Carlini, Daphne Ippolito, Florian Tramèr
ICLR 2025 On Evaluating the Durability of Safeguards for Open-Weight LLMs Xiangyu Qi, Boyi Wei, Nicholas Carlini, Yangsibo Huang, Tinghao Xie, Luxi He, Matthew Jagielski, Milad Nasr, Prateek Mittal, Peter Henderson
ICLR 2025 Persistent Pre-Training Poisoning of LLMs Yiming Zhang, Javier Rando, Ivan Evtimov, Jianfeng Chi, Eric Michael Smith, Nicholas Carlini, Florian Tramèr, Daphne Ippolito
ICML 2025 Position: In-House Evaluation Is Not Enough. Towards Robust Third-Party Evaluation and Flaw Disclosure for General-Purpose AI Shayne Longpre, Kevin Klyman, Ruth Elisabeth Appel, Sayash Kapoor, Rishi Bommasani, Michelle Sahar, Sean Mcgregor, Avijit Ghosh, Borhane Blili-Hamelin, Nathan Butters, Alondra Nelson, Dr. Amit Elazari, Andrew Sellars, Casey John Ellis, Dane Sherrets, Dawn Song, Harley Geiger, Ilona Cohen, Lauren Mcilvenny, Madhulika Srikumar, Mark M. Jaycox, Markus Anderljung, Nadine Farid Johnson, Nicholas Carlini, Nicolas Miailhe, Nik Marda, Peter Henderson, Rebecca S. Portnoff, Rebecca Weiss, Victoria Westerhoff, Yacine Jernite, Rumman Chowdhury, Percy Liang, Arvind Narayanan
ICLR 2025 Scalable Extraction of Training Data from Aligned, Production Language Models Milad Nasr, Javier Rando, Nicholas Carlini, Jonathan Hayase, Matthew Jagielski, A. Feder Cooper, Daphne Ippolito, Christopher A. Choquette-Choo, Florian Tramèr, Katherine Lee
CVPR 2024 Initialization Matters for Adversarial Transfer Learning Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin
ICML 2024 Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining Florian Tramèr, Gautam Kamath, Nicholas Carlini
NeurIPS 2024 Privacy Backdoors: Enhancing Membership Inference Through Poisoning Pre-Trained Models Yuxin Wen, Leo Marchyok, Sanghyun Hong, Jonas Geiping, Tom Goldstein, Nicholas Carlini
NeurIPS 2024 Query-Based Adversarial Prompt Generation Jonathan Hayase, Ema Borevkovic, Nicholas Carlini, Florian Tramèr, Milad Nasr
ICML 2024 Stealing Part of a Production Language Model Nicholas Carlini, Daniel Paleka, Krishnamurthy Dj Dvijotham, Thomas Steinke, Jonathan Hayase, A. Feder Cooper, Katherine Lee, Matthew Jagielski, Milad Nasr, Arthur Conmy, Eric Wallace, David Rolnick, Florian Tramèr
ICLR 2023 (Certified!!) Adversarial Robustness for Free! Nicholas Carlini, Florian Tramer, Krishnamurthy Dj Dvijotham, Leslie Rice, Mingjie Sun, J Zico Kolter
NeurIPS 2023 Are Aligned Neural Networks Adversarially Aligned? Nicholas Carlini, Milad Nasr, Christopher A. Choquette-Choo, Matthew Jagielski, Irena Gao, Pang Wei W Koh, Daphne Ippolito, Florian Tramer, Ludwig Schmidt
ICMLW 2023 Backdoor Attacks for In-Context Learning with Language Models Nikhil Kandpal, Matthew Jagielski, Florian Tramèr, Nicholas Carlini
NeurIPS 2023 Counterfactual Memorization in Neural Language Models Chiyuan Zhang, Daphne Ippolito, Katherine Lee, Matthew Jagielski, Florian Tramer, Nicholas Carlini
NeurIPS 2023 Effective Robustness Against Natural Distribution Shifts for Models with Different Training Data Zhouxing Shi, Nicholas Carlini, Ananth Balashankar, Ludwig Schmidt, Cho-Jui Hsieh, Alex Beutel, Yao Qin
ICMLW 2023 Evading Black-Box Classifiers Without Breaking Eggs Edoardo Debenedetti, Nicholas Carlini, Florian Tramèr
ICLR 2023 Measuring Forgetting of Memorized Training Examples Matthew Jagielski, Om Thakkar, Florian Tramer, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Guha Thakurta, Nicolas Papernot, Chiyuan Zhang
ICLR 2023 Part-Based Models Improve Adversarial Robustness Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner
ICML 2023 Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems Chawin Sitawarin, Florian Tramèr, Nicholas Carlini
ICLR 2023 Quantifying Memorization Across Neural Language Models Nicholas Carlini, Daphne Ippolito, Matthew Jagielski, Katherine Lee, Florian Tramer, Chiyuan Zhang
NeurIPS 2023 Students Parrot Their Teachers: Membership Inference on Model Distillation Matthew Jagielski, Milad Nasr, Katherine Lee, Christopher A. Choquette-Choo, Nicholas Carlini, Florian Tramer
ICLR 2022 AdaMatch: A Unified Approach to Semi-Supervised Learning and Domain Adaptation David Berthelot, Rebecca Roelofs, Kihyuk Sohn, Nicholas Carlini, Alexey Kurakin
ICLR 2022 Data Poisoning Won’t Save You from Facial Recognition Evani Radiya-Dixit, Sanghyun Hong, Nicholas Carlini, Florian Tramer
ICLR 2022 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent Oliver Bryniarski, Nabeel Hingun, Pedro Pachuca, Vincent Wang, Nicholas Carlini
NeurIPS 2022 Handcrafted Backdoors in Deep Neural Networks Sanghyun Hong, Nicholas Carlini, Alexey Kurakin
NeurIPS 2022 Increasing Confidence in Adversarial Robustness Evaluations Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini
NeurIPS 2022 Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli
NeurIPSW 2022 Part-Based Models Improve Adversarial Robustness Chawin Sitawarin, Kornrapat Pongmala, Yizheng Chen, Nicholas Carlini, David Wagner
ICLR 2022 Poisoning and Backdooring Contrastive Learning Nicholas Carlini, Andreas Terzis
NeurIPS 2022 The Privacy Onion Effect: Memorization Is Relative Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer
ICMLW 2021 Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples Maura Pintor, Luca Demetrio, Angelo Sotgiu, Giovanni Manca, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli
ICML 2021 Label-Only Membership Inference Attacks Christopher A. Choquette-Choo, Florian Tramer, Nicholas Carlini, Nicolas Papernot
NeurIPSW 2021 Measuring Robustness to Natural Distribution Shifts in Image Classification Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
CVPRW 2020 Evading Deepfake-Image Detectors with White- and Black-Box Attacks Nicholas Carlini, Hany Farid
NeurIPS 2020 FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence Kihyuk Sohn, David Berthelot, Nicholas Carlini, Zizhao Zhang, Han Zhang, Colin A Raffel, Ekin Dogus Cubuk, Alexey Kurakin, Chun-Liang Li
ICML 2020 Fundamental Tradeoffs Between Invariance and Sensitivity to Adversarial Perturbations Florian Tramer, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Joern-Henrik Jacobsen
NeurIPS 2020 Measuring Robustness to Natural Distribution Shifts in Image Classification Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, Ludwig Schmidt
NeurIPS 2020 On Adaptive Attacks to Adversarial Example Defenses Florian Tramer, Nicholas Carlini, Wieland Brendel, Aleksander Madry
ICLR 2020 ReMixMatch: Semi-Supervised Learning with Distribution Matching and Augmentation Anchoring David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, Colin Raffel
ICML 2019 Adversarial Examples Are a Natural Consequence of Test Error in Noise Justin Gilmer, Nicolas Ford, Nicholas Carlini, Ekin Cubuk
ICML 2019 Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, Colin Raffel
NeurIPS 2019 MixMatch: A Holistic Approach to Semi-Supervised Learning David Berthelot, Nicholas Carlini, Ian Goodfellow, Nicolas Papernot, Avital Oliver, Colin A Raffel
ICML 2018 Obfuscated Gradients Give a False Sense of Security: Circumventing Defenses to Adversarial Examples Anish Athalye, Nicholas Carlini, David Wagner