Kirk, Robert

26 publications

NeurIPS 2025 Fundamental Limitations in Pointwise Defences of LLM Finetuning APIs Xander Davies, Eric Winsor, Alexandra Souly, Tomek Korbak, Robert Kirk, Christian Schroeder de Witt, Yarin Gal
ICML 2025 How Do Large Language Monkeys Get Their Power (Laws)? Rylan Schaeffer, Joshua Kazdan, John Hughes, Jordan Juravsky, Sara Price, Aengus Lynch, Erik Jones, Robert Kirk, Azalia Mirhoseini, Sanmi Koyejo
ICML 2025 Investigating Non-Transitivity in LLM-as-a-Judge Yi Xu, Laura Ruis, Tim Rocktäschel, Robert Kirk
TMLR 2025 Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities Zora Che, Stephen Casper, Robert Kirk, Anirudh Satheesh, Stewart Slocum, Lev E McKinney, Rohit Gandikota, Aidan Ewart, Domenic Rosati, Zichu Wu, Zikui Cai, Bilal Chughtai, Yarin Gal, Furong Huang, Dylan Hadfield-Menell
ICLR 2025 Procedural Knowledge in Pretraining Drives Reasoning in Large Language Models Laura Ruis, Maximilian Mozes, Juhan Bae, Siddhartha Rao Kamalakara, Dwaraknath Gnaneshwar, Acyr Locatelli, Robert Kirk, Tim Rocktäschel, Edward Grefenstette, Max Bartolo
NeurIPS 2025 Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition Andy Zou, Maxwell Lin, Eliot Krzysztof Jones, Micha V. Nowak, Mateusz Dziemian, Nick Winter, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Yarin Gal, Dan Hendrycks, J Zico Kolter, Matt Fredrikson
NeurIPS 2024 Analysing the Generalisation and Reliability of Steering Vectors Daniel Tan, David Chanin, Aengus Lynch, Brooks Paige, Dimitrios Kanoulas, Adrià Garriga-Alonso, Robert Kirk
ICMLW 2024 Analyzing the Generalization and Reliability of Steering Vectors Daniel Chee Hian Tan, David Chanin, Aengus Lynch, Adrià Garriga-Alonso, Dimitrios Kanoulas, Brooks Paige, Robert Kirk
ICML 2024 Generalization to New Sequential Decision Making Tasks with In-Context Learning Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu
ICLR 2024 Mechanistically Analyzing the Effects of Fine-Tuning on Procedurally Defined Tasks Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Krueger
ICLRW 2024 Mechanistically Analyzing the Effects of Fine-Tuning on Procedurally Defined Tasks Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Krueger
ICLR 2024 Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste, Usman Anwar, Robert Kirk, David Krueger
ICLR 2024 Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu
JAIR 2023 A Survey of Zero-Shot Generalisation in Deep Reinforcement Learning Robert Kirk, Amy Zhang, Edward Grefenstette, Tim Rocktäschel
NeurIPSW 2023 How Does Fine-Tuning Affect Your Model? Mechanistic Analysis on Procedural Tasks Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Krueger
NeurIPSW 2023 How Does Fine-Tuning Affect Your Model? Mechanistic Analysis on Procedural Tasks Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tim Rocktäschel, Edward Grefenstette, David Krueger
NeurIPSW 2023 Leading the Pack: N-Player Opponent Shaping Alexandra Souly, Timon Willi, Akbir Khan, Robert Kirk, Chris Lu, Edward Grefenstette, Tim Rocktäschel
NeurIPSW 2023 Learning to Solve New Sequential Decision-Making Tasks with In-Context Learning Sharath Chandra Raparthy, Eric Hambro, Robert Kirk, Mikael Henaff, Roberta Raileanu
NeurIPSW 2023 Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste, Usman Anwar, Robert Kirk, David Krueger
NeurIPSW 2023 Reward Model Ensembles Help Mitigate Overoptimization Thomas Coste, Usman Anwar, Robert Kirk, David Krueger
NeurIPSW 2023 Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk, Ishita Mediratta, Christoforos Nalmpantis, Jelena Luketina, Eric Hambro, Edward Grefenstette, Roberta Raileanu
NeurIPSW 2023 What Mechanisms Does Knowledge Distillation Distill? Cindy Wu, Ekdeep Singh Lubana, Bruno Kacper Mlodozeniec, Robert Kirk, David Krueger
ICLRW 2022 A Study of Off-Policy Learning in Environments with Procedural Content Generation Andy Ehrenberg, Robert Kirk, Minqi Jiang, Edward Grefenstette, Tim Rocktäschel
NeurIPSW 2022 Domain Generalization for Robust Model-Based Offline Reinforcement Learning Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger
NeurIPSW 2022 Domain Generalization for Robust Model-Based Offline Reinforcement Learning Alan Clark, Shoaib Ahmed Siddiqui, Robert Kirk, Usman Anwar, Stephen Chung, David Krueger
NeurIPSW 2021 Graph Backup: Data Efficient Backup Exploiting Markovian Data Zhengyao Jiang, Tianjun Zhang, Robert Kirk, Tim Rocktäschel, Edward Grefenstette