Filan, Daniel

5 publications

ICML 2025 Constrained Belief Updates Explain Geometric Structures in Transformer Representations Mateusz Piotrowski, Paul M. Riechers, Daniel Filan, Adam Shai
NeurIPSW 2024 Constrained Belief Updating and Geometric Structures in Transformer Representations Mateusz Piotrowski, Paul M. Riechers, Daniel Filan, Adam Shai
NeurIPSW 2024 Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Capabilities Zora Che, Stephen Casper, Anirudh Satheesh, Rohit Gandikota, Domenic Rosati, Stewart Slocum, Lev E McKinney, Zichu Wu, Zikui Cai, Bilal Chughtai, Daniel Filan, Furong Huang, Dylan Hadfield-Menell
ICLRW 2022 Graphical Clusterability and Local Specialization in Deep Neural Networks Stephen Casper, Shlomi Hod, Daniel Filan, Cody Wild, Andrew Critch, Stuart Russell
AISTATS 2016 Loss Bounds and Time Complexity for Speed Priors Daniel Filan, Jan Leike, Marcus Hutter