Petrov, Aleksandar

16 publications

ICLR 2025 Do as I Do (Safely): Mitigating Task-Specific Fine-Tuning Risks in Large Language Models Francisco Eiras, Aleksandar Petrov, Philip Torr, M. Pawan Kumar, Adel Bibi

AAAI 2025 Language-Models-as-a-Service: Overview of a New Paradigm and Its Challenges Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael J. Wooldridge

NeurIPS 2025 On the Coexistence and Ensembling of Watermarks Aleksandar Petrov, Shruti Agarwal, Philip Torr, Adel Bibi, John Collomosse

ICLRW 2025 On the Coexistence and Ensembling of Watermarks Aleksandar Petrov, Shruti Agarwal, Philip Torr, Adel Bibi, John Collomosse

TMLR 2024 Foundational Challenges in Assuring Alignment and Safety of Large Language Models Usman Anwar, Abulhair Saparov, Javier Rando, Daniel Paleka, Miles Turpin, Peter Hase, Ekdeep Singh Lubana, Erik Jenner, Stephen Casper, Oliver Sourbut, Benjamin L. Edelman, Zhaowei Zhang, Mario Günther, Anton Korinek, Jose Hernandez-Orallo, Lewis Hammond, Eric J Bigelow, Alexander Pan, Lauro Langosco, Tomasz Korbak, Heidi Chenyu Zhang, Ruiqi Zhong, Sean O hEigeartaigh, Gabriel Recchia, Giulio Corsi, Alan Chan, Markus Anderljung, Lilian Edwards, Aleksandar Petrov, Christian Schroeder de Witt, Sumeet Ramesh Motwani, Yoshua Bengio, Danqi Chen, Philip Torr, Samuel Albanie, Tegan Maharaj, Jakob Nicolaus Foerster, Florian Tramèr, He He, Atoosa Kasirzadeh, Yejin Choi, David Krueger

JAIR 2024 Language-Models-as-a-Service: Overview of a New Paradigm and Its Challenges Emanuele La Malfa, Aleksandar Petrov, Simon Frieder, Christoph Weinhuber, Ryan Burnell, Raza Nazar, Anthony G. Cohn, Nigel Shadbolt, Michael J. Wooldridge

ICMLW 2024 Mimicking User Data: On Mitigating Fine-Tuning Risks in Closed Large Language Models Francisco Eiras, Aleksandar Petrov, Philip Torr, M. Pawan Kumar, Adel Bibi

ICML 2024 Position: Near to Mid-Term Risks and Opportunities of Open-Source Generative AI Francisco Eiras, Aleksandar Petrov, Bertie Vidgen, Christian Schroeder De Witt, Fabio Pizzati, Katherine Elkins, Supratik Mukhopadhyay, Adel Bibi, Botos Csaba, Fabro Steibel, Fazl Barez, Genevieve Smith, Gianluca Guadagni, Jon Chun, Jordi Cabot, Joseph Marvin Imperial, Juan A. Nolazco-Flores, Lori Landay, Matthew Thomas Jackson, Paul Rottger, Philip Torr, Trevor Darrell, Yong Suk Lee, Jakob Nicolaus Foerster

ICML 2024 Prompting a Pretrained Transformer Can Be a Universal Approximator Aleksandar Petrov, Philip Torr, Adel Bibi

NeurIPS 2024 Universal In-Context Approximation by Prompting Fully Recurrent Models Aleksandar Petrov, Tom A. Lamb, Alasdair Paren, Philip H.S. Torr, Adel Bibi

ICLR 2024 When Do Prompting and Prefix-Tuning Work? a Theory of Capabilities and Limitations Aleksandar Petrov, Philip Torr, Adel Bibi

ICML 2023 Certifying Ensembles: A General Certification Theory with S-Lipschitzness Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip Torr, Adel Bibi

ICMLW 2023 Certifying Ensembles: A General Certification Theory with S-Lipschitzness Aleksandar Petrov, Francisco Eiras, Amartya Sanyal, Philip Torr, Adel Bibi

NeurIPS 2023 Language Model Tokenizers Introduce Unfairness Between Languages Aleksandar Petrov, Emanuele La Malfa, Philip Torr, Adel Bibi

ICMLW 2023 Language Model Tokenizers Introduce Unfairness Between Languages Aleksandar Petrov, Emanuele La Malfa, Philip Torr, Adel Bibi

NeurIPSW 2023 When Do Prompting and Prefix-Tuning Work? a Theory of Capabilities and Limitations Aleksandar Petrov, Philip Torr, Adel Bibi