Soligo, Anna

2 publications

ICLR 2026 Emergent Misalignment Is Easy, Narrow Misalignment Is Hard Anna Soligo, Edward Turner, Senthooran Rajamanoharan, Neel Nanda
ICML 2025 Inducing, Detecting and Characterising Neural Modules: A Pipeline for Functional Interpretability in Reinforcement Learning Anna Soligo, Pietro Ferraro, David Boyle