Schoots, Nandi

9 publications

TMLR 2025 Open Problems in Mechanistic Interpretability Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeffrey Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Isaac Bloom, Stella Biderman, AdriĆ  Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Mary Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, William Saunders, Eric J Michaud, Stephen Casper, Max Tegmark, David Bau, Eric Todd, Atticus Geiger, Mor Geva, Jesse Hoogland, Daniel Murfet, Thomas McGrath
AISTATS 2025 Relating Piecewise Linear Kolmogorov Arnold Networks to ReLU Networks Nandi Schoots, Mattia Jacopo Villani, Niels Bos
NeurIPSW 2024 Emergence of Steganography Between Large Language Models Yohan Mathew, Robert McCarthy, Joan Velja, Ollie Matthews, Nandi Schoots, Dylan Cope
NeurIPSW 2024 Hidden in Plain Text: Emergence & Mitigation of Steganographic Collusion in LLMs Yohan Mathew, Ollie Matthews, Robert McCarthy, Joan Velja, Christian Schroeder de Witt, Dylan Cope, Nandi Schoots
NeurIPSW 2024 Steganography in Large Language Models: Investigating Emergence and Mitigations Yohan Mathew, Robert McCarthy, Ollie Matthews, Joan Velja, Nandi Schoots, Dylan Cope
NeurIPSW 2024 Training Neural Networks for Modularity Aids Interpretability Satvik Golechha, Dylan Cope, Nandi Schoots
ICML 2023 A Theory of Representation Learning Gives a Deep Generalisation of Kernel Methods Adam X. Yang, Maxime Robeyns, Edward Milsom, Ben Anson, Nandi Schoots, Laurence Aitchison
NeurIPSW 2023 Comparing Optimization Targets for Contrast-Consistent Search Hugo Fry, Seamus Fallows, Jamie Wright, Ian Fan, Nandi Schoots
NeurIPSW 2023 Dissecting Large Language Models Nicky Pochinkov, Nandi Schoots