McKenzie, Alex

1 publications

NeurIPS 2025 Detecting High-Stakes Interactions with Activation Probes Alex McKenzie, Urja Pawar, Phil Blandfort, William Bankes, David Krueger, Ekdeep Singh Lubana, Dmitrii Krasheninnikov