Hobbhahn, Marius

11 publications

ICML 2025 Detecting Strategic Deception with Linear Probes Nicholas Goldowsky-Dill, Bilal Chughtai, Stefan Heimersheim, Marius Hobbhahn
NeurIPSW 2024 Analyzing Probabilistic Methods for Evaluating Agent Capabilities Axel Højmark, Govind Pimpale, Arjun Panickssery, Marius Hobbhahn, Jérémy Scheurer
ICLRW 2024 Large Language Models Can Strategically Deceive Their Users When Put Under Pressure Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn
NeurIPS 2024 Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs Rudolf Laine, Bilal Chughtai, Jan Betley, Kaivalya Hariharan, Jérémy Scheurer, Mikita Balesni, Marius Hobbhahn, Alexander Meinke, Owain Evans
ICML 2024 Position: Will We Run Out of Data? Limits of LLM Scaling Based on Human-Generated Data Pablo Villalobos, Anson Ho, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Marius Hobbhahn
ICMLW 2024 Using Degeneracy in the Loss Landscape for Mechanistic Interpretability Lucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius Hobbhahn
UAI 2022 Fast Predictive Uncertainty for Classification with Bayesian Deep Networks Marius Hobbhahn, Agustinus Kristiadi, Philipp Hennig
NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler
NeurIPSW 2022 Investigating Causal Understanding in LLMs Marius Hobbhahn, Tom Lieberum, David Seiler
NeurIPSW 2022 Reflection Mechanisms as an Alignment Target: A Survey Marius Hobbhahn, Eric Landgrebe, Elizabeth Barnes
NeurIPSW 2022 Reflection Mechanisms as an Alignment Target: A Survey Marius Hobbhahn, Eric Landgrebe, Elizabeth Barnes