Petrov, Aleksandar

16 publications

ICLR 2025 Do as I Do (Safely): Mitigating Task-Specific Fine-Tuning Risks in Large Language Models Francisco Eiras, Aleksandar Petrov, Philip Torr, M. Pawan Kumar, Adel Bibi
AAAI 2025 Language-Models-as-a-Service: Overview of a New Paradigm and Its Challenges Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael J. Wooldridge
NeurIPS 2025 On the Coexistence and Ensembling of Watermarks Aleksandar Petrov, Shruti Agarwal, Philip Torr, Adel Bibi, John Collomosse
ICLRW 2025 On the Coexistence and Ensembling of Watermarks Aleksandar Petrov, Shruti Agarwal, Philip Torr, Adel Bibi, John Collomosse
TMLR 2024 Foundational Challenges in Assuring Alignment and Safety of Large Language Models Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Sean O hEigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwani, Yoshua Bengio, Danqi Chen, Philip Torr, Samuel Albanie, Tegan Maharaj, Jakob Nicolaus Foerster, Florian Tramèr, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger
JAIR 2024 Language-Models-as-a-Service: Overview of a New Paradigm and Its Challenges Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael J. Wooldridge
ICMLW 2024 Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models Francisco Eiras, Aleksandar Petrov, Philip Torr, M. Pawan Kumar, Adel Bibi
ICML 2024 Position: Near to Mid-Term Risks and Opportunities of Open-Source Generative AI Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder De Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Thomas Jackson, Paul Rottger, Philip Torr, Trevor Darrell, Yong Suk Lee, Jakob Nicolaus Foerster
ICML 2024 Prompting a Pretrained Transformer Can Be a Universal Approximator Aleksandar Petrov, Philip Torr, Adel Bibi
NeurIPS 2024 Universal In-Context Approximation by Prompting Fully Recurrent Models Aleksandar Petrov, Tom A. Lamb, Alasdair Paren, Philip H.S. Torr, Adel Bibi
ICLR 2024 When Do Prompting and Prefix-Tuning Work? a Theory of Capabilities and Limitations Aleksandar Petrov, Philip Torr, Adel Bibi
ICML 2023 Certifying Ensembles: A General Certification Theory with S-Lipschitzness Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip Torr, Adel Bibi
ICMLW 2023 Certifying Ensembles: A General Certification Theory with S-Lipschitzness Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip Torr, Adel Bibi
NeurIPS 2023 Language Model Tokenizers Introduce Unfairness Between Languages Aleksandar Petrov, Emanuele La Malfa, Philip Torr, Adel Bibi
ICMLW 2023 Language Model Tokenizers Introduce Unfairness Between Languages Aleksandar Petrov, Emanuele La Malfa, Philip Torr, Adel Bibi
NeurIPSW 2023 When Do Prompting and Prefix-Tuning Work? a Theory of Capabilities and Limitations Aleksandar Petrov, Philip Torr, Adel Bibi