Robey, Alexander

23 publications

TMLR 2026 Steering Dialogue Dynamics for Robustness Against Multi-Turn Jailbreaking Attacks Hanjiang Hu, Alexander Robey, Changliu Liu
NeurIPS 2025 Antidistillation Sampling Yash Savani, Asher Trockman, Zhili Feng, Yixuan Even Xu, Avi Schwarzschild, Alexander Robey, Marc Anton Finzi, J Zico Kolter
TMLR 2025 Automated Black-Box Prompt Engineering for Personalized Text-to-Image Generation Yutong He, Alexander Robey, Naoki Murata, Yiding Jiang, Joshua Nathaniel Williams, George J. Pappas, Hamed Hassani, Yuki Mitsufuji, Ruslan Salakhutdinov, J Zico Kolter
NeurIPS 2025 Emerging Risks from Embodied AI Require Urgent Policy Action Jared Perlo, Alexander Robey, Fazl Barez, Jakob Mökander
ICLRW 2025 Evaluating LLM Memorization Using Soft Token Sparsity Zhili Feng, Yixuan Even Xu, Pratyush Maini, Alexander Robey, Avi Schwarzschild, J Zico Kolter
NeurIPS 2025 Safety Pretraining: Toward the Next Generation of Safe AI Pratyush Maini, Sachin Goyal, Dylan Sam, Alexander Robey, Yash Savani, Yiding Jiang, Andy Zou, Matt Fredrikson, Zachary Chase Lipton, J Zico Kolter
TMLR 2025 SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Alexander Robey, Eric Wong, Hamed Hassani, George J. Pappas
ICLR 2024 Adversarial Training Should Be Cast as a Non-Zero-Sum Game Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher
NeurIPS 2024 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, Eric Wong
ICMLW 2024 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Patrick Chao, Edoardo Debenedetti, Alexander Robey, Maksym Andriushchenko, Francesco Croce, Vikash Sehwag, Edgar Dobriban, Nicolas Flammarion, George J. Pappas, Florian Tramèr, Hamed Hassani, Eric Wong
ICML 2024 Position: A Safe Harbor for AI Evaluation and Red Teaming Shayne Longpre, Sayash Kapoor, Kevin Klyman, Ashwin Ramaswami, Rishi Bommasani, Borhane Blili-Hamelin, Yangsibo Huang, Aviya Skowron, Zheng Xin Yong, Suhas Kotha, Yi Zeng, Weiyan Shi, Xianjun Yang, Reid Southen, Alexander Robey, Patrick Chao, Diyi Yang, Ruoxi Jia, Daniel Kang, Alex Pentland, Arvind Narayanan, Percy Liang, Peter Henderson
ICMLW 2023 Adversarial Training Should Be Cast as a Non-Zero-Sum Game Alexander Robey, Fabian Latorre, George J. Pappas, Hamed Hassani, Volkan Cevher
NeurIPSW 2023 Jailbreaking Black Box Large Language Models in Twenty Queries Patrick Chao, Alexander Robey, Edgar Dobriban, Hamed Hassani, George J. Pappas, Eric Wong
NeurIPSW 2023 SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks Alexander Robey, Eric Wong, Hamed Hassani, George Pappas
ICLR 2022 Do Deep Networks Transfer Invariances Across Classes? Allan Zhou, Fahim Tajwar, Alexander Robey, Tom Knowles, George J. Pappas, Hamed Hassani, Chelsea Finn
L4DC 2022 On the Sample Complexity of Stability Constrained Imitation Learning Stephen Tu, Alexander Robey, Tingnan Zhang, Nikolai Matni
ICML 2022 Probabilistically Robust Learning: Balancing Average and Worst-Case Performance Alexander Robey, Luiz Chamon, George J. Pappas, Hamed Hassani
NeurIPS 2022 Probable Domain Generalization via Quantile Risk Minimization Cian Eastwood, Alexander Robey, Shashank Singh, Julius von Kügelgen, Hamed Hassani, George J. Pappas, Bernhard Schölkopf
NeurIPS 2021 Adversarial Robustness with Semi-Infinite Constrained Learning Alexander Robey, Luiz Chamon, George J. Pappas, Hamed Hassani, Alejandro Ribeiro
NeurIPS 2021 Model-Based Domain Generalization Alexander Robey, George J. Pappas, Hamed Hassani
L4DC 2021 Optimal Algorithms for Submodular Maximization with Distributed Constraints Alexander Robey, Arman Adibi, Brent Schlotfeldt, Hamed Hassani, George J. Pappas
CoRL 2020 Learning Hybrid Control Barrier Functions from Data Lars Lindemann, Haimin Hu, Alexander Robey, Hanwen Zhang, Dimos Dimarogonas, Stephen Tu, Nikolai Matni
NeurIPS 2019 Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks Mahyar Fazlyab, Alexander Robey, Hamed Hassani, Manfred Morari, George Pappas